283 research outputs found

    Injecting Uncertainty in Graphs for Identity Obfuscation

    Full text link
    Data collected nowadays by social-networking applications create fascinating opportunities for building novel services, as well as expanding our understanding about social structures and their dynamics. Unfortunately, publishing social-network graphs is considered an ill-advised practice due to privacy concerns. To alleviate this problem, several anonymization methods have been proposed, aiming at reducing the risk of a privacy breach on the published data, while still allowing to analyze them and draw relevant conclusions. In this paper we introduce a new anonymization approach that is based on injecting uncertainty in social graphs and publishing the resulting uncertain graphs. While existing approaches obfuscate graph data by adding or removing edges entirely, we propose using a finer-grained perturbation that adds or removes edges partially: this way we can achieve the same desired level of obfuscation with smaller changes in the data, thus maintaining higher utility. Our experiments on real-world networks confirm that at the same level of identity obfuscation our method provides higher usefulness than existing randomized methods that publish standard graphs.Comment: VLDB201

    Anonymizing Social Graphs via Uncertainty Semantics

    Full text link
    Rather than anonymizing social graphs by generalizing them to super nodes/edges or adding/removing nodes and edges to satisfy given privacy parameters, recent methods exploit the semantics of uncertain graphs to achieve privacy protection of participating entities and their relationship. These techniques anonymize a deterministic graph by converting it into an uncertain form. In this paper, we propose a generalized obfuscation model based on uncertain adjacency matrices that keep expected node degrees equal to those in the unanonymized graph. We analyze two recently proposed schemes and show their fitting into the model. We also point out disadvantages in each method and present several elegant techniques to fill the gap between them. Finally, to support fair comparisons, we develop a new tradeoff quantifying framework by leveraging the concept of incorrectness in location privacy research. Experiments on large social graphs demonstrate the effectiveness of our schemes

    A Maximum Variance Approach for Graph Anonymization

    Get PDF
    Best Paper AwardInternational audienceUncertain graphs, a form of uncertain data, have recently attracted a lot of attention as they can represent inherent uncertainty in collected data. The uncertain graphs pose challenges to conventional data processing techniques and open new research directions. Going in the reserve direction, this paper focuses on the problem of anonymizing a deterministic graph by converting it into an uncertain form. The paper first analyzes drawbacks in a recent uncertainty-based anonymization scheme and then proposes Maximum Variance, a novel approach that provides better tradeoff between privacy and utility. Towards a fair com-parison between the anonymization schemes on graphs, the second con-tribution of this paper is to describe a quantifying framework for graph anonymization by assessing privacy and utility scores of typical schemes in a unified space. The extensive experiments show the effectiveness and efficiency of Maximum Variance on three large real graphs

    Node Classification in Uncertain Graphs

    Full text link
    In many real applications that use and analyze networked data, the links in the network graph may be erroneous, or derived from probabilistic techniques. In such cases, the node classification problem can be challenging, since the unreliability of the links may affect the final results of the classification process. If the information about link reliability is not used explicitly, the classification accuracy in the underlying network may be affected adversely. In this paper, we focus on situations that require the analysis of the uncertainty that is present in the graph structure. We study the novel problem of node classification in uncertain graphs, by treating uncertainty as a first-class citizen. We propose two techniques based on a Bayes model and automatic parameter selection, and show that the incorporation of uncertainty in the classification process as a first-class citizen is beneficial. We experimentally evaluate the proposed approach using different real data sets, and study the behavior of the algorithms under different conditions. The results demonstrate the effectiveness and efficiency of our approach

    Risk-Averse Matchings over Uncertain Graph Databases

    Full text link
    A large number of applications such as querying sensor networks, and analyzing protein-protein interaction (PPI) networks, rely on mining uncertain graph and hypergraph databases. In this work we study the following problem: given an uncertain, weighted (hyper)graph, how can we efficiently find a (hyper)matching with high expected reward, and low risk? This problem naturally arises in the context of several important applications, such as online dating, kidney exchanges, and team formation. We introduce a novel formulation for finding matchings with maximum expected reward and bounded risk under a general model of uncertain weighted (hyper)graphs that we introduce in this work. Our model generalizes probabilistic models used in prior work, and captures both continuous and discrete probability distributions, thus allowing to handle privacy related applications that inject appropriately distributed noise to (hyper)edge weights. Given that our optimization problem is NP-hard, we turn our attention to designing efficient approximation algorithms. For the case of uncertain weighted graphs, we provide a 13\frac{1}{3}-approximation algorithm, and a 15\frac{1}{5}-approximation algorithm with near optimal run time. For the case of uncertain weighted hypergraphs, we provide a Ω(1k)\Omega(\frac{1}{k})-approximation algorithm, where kk is the rank of the hypergraph (i.e., any hyperedge includes at most kk nodes), that runs in almost (modulo log factors) linear time. We complement our theoretical results by testing our approximation algorithms on a wide variety of synthetic experiments, where we observe in a controlled setting interesting findings on the trade-off between reward, and risk. We also provide an application of our formulation for providing recommendations of teams that are likely to collaborate, and have high impact.Comment: 25 page
    • …
    corecore