21,737 research outputs found

    Mining Frequent Graph Patterns with Differential Privacy

    Full text link
    Discovering frequent graph patterns in a graph database offers valuable information in a variety of applications. However, if the graph dataset contains sensitive data of individuals such as mobile phone-call graphs and web-click graphs, releasing discovered frequent patterns may present a threat to the privacy of individuals. {\em Differential privacy} has recently emerged as the {\em de facto} standard for private data analysis due to its provable privacy guarantee. In this paper we propose the first differentially private algorithm for mining frequent graph patterns. We first show that previous techniques on differentially private discovery of frequent {\em itemsets} cannot apply in mining frequent graph patterns due to the inherent complexity of handling structural information in graphs. We then address this challenge by proposing a Markov Chain Monte Carlo (MCMC) sampling based algorithm. Unlike previous work on frequent itemset mining, our techniques do not rely on the output of a non-private mining algorithm. Instead, we observe that both frequent graph pattern mining and the guarantee of differential privacy can be unified into an MCMC sampling framework. In addition, we establish the privacy and utility guarantee of our algorithm and propose an efficient neighboring pattern counting technique as well. Experimental results show that the proposed algorithm is able to output frequent patterns with good precision

    Time Distortion Anonymization for the Publication of Mobility Data with High Utility

    Get PDF
    An increasing amount of mobility data is being collected every day by different means, such as mobile applications or crowd-sensing campaigns. This data is sometimes published after the application of simple anonymization techniques (e.g., putting an identifier instead of the users' names), which might lead to severe threats to the privacy of the participating users. Literature contains more sophisticated anonymization techniques, often based on adding noise to the spatial data. However, these techniques either compromise the privacy if the added noise is too little or the utility of the data if the added noise is too strong. We investigate in this paper an alternative solution, which builds on time distortion instead of spatial distortion. Specifically, our contribution lies in (1) the introduction of the concept of time distortion to anonymize mobility datasets (2) Promesse, a protection mechanism implementing this concept (3) a practical study of Promesse compared to two representative spatial distortion mechanisms, namely Wait For Me, which enforces k-anonymity, and Geo-Indistinguishability, which enforces differential privacy. We evaluate our mechanism practically using three real-life datasets. Our results show that time distortion reduces the number of points of interest that can be retrieved by an adversary to under 3 %, while the introduced spatial error is almost null and the distortion introduced on the results of range queries is kept under 13 % on average.Comment: in 14th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, Aug 2015, Helsinki, Finlan

    Privacy via the Johnson-Lindenstrauss Transform

    Full text link
    Suppose that party A collects private information about its users, where each user's data is represented as a bit vector. Suppose that party B has a proprietary data mining algorithm that requires estimating the distance between users, such as clustering or nearest neighbors. We ask if it is possible for party A to publish some information about each user so that B can estimate the distance between users without being able to infer any private bit of a user. Our method involves projecting each user's representation into a random, lower-dimensional space via a sparse Johnson-Lindenstrauss transform and then adding Gaussian noise to each entry of the lower-dimensional representation. We show that the method preserves differential privacy---where the more privacy is desired, the larger the variance of the Gaussian noise. Further, we show how to approximate the true distances between users via only the lower-dimensional, perturbed data. Finally, we consider other perturbation methods such as randomized response and draw comparisons to sketch-based methods. While the goal of releasing user-specific data to third parties is more broad than preserving distances, this work shows that distance computations with privacy is an achievable goal.Comment: 24 page

    Measuring Membership Privacy on Aggregate Location Time-Series

    Get PDF
    While location data is extremely valuable for various applications, disclosing it prompts serious threats to individuals' privacy. To limit such concerns, organizations often provide analysts with aggregate time-series that indicate, e.g., how many people are in a location at a time interval, rather than raw individual traces. In this paper, we perform a measurement study to understand Membership Inference Attacks (MIAs) on aggregate location time-series, where an adversary tries to infer whether a specific user contributed to the aggregates. We find that the volume of contributed data, as well as the regularity and particularity of users' mobility patterns, play a crucial role in the attack's success. We experiment with a wide range of defenses based on generalization, hiding, and perturbation, and evaluate their ability to thwart the attack vis-a-vis the utility loss they introduce for various mobility analytics tasks. Our results show that some defenses fail across the board, while others work for specific tasks on aggregate location time-series. For instance, suppressing small counts can be used for ranking hotspots, data generalization for forecasting traffic, hotspot discovery, and map inference, while sampling is effective for location labeling and anomaly detection when the dataset is sparse. Differentially private techniques provide reasonable accuracy only in very specific settings, e.g., discovering hotspots and forecasting their traffic, and more so when using weaker privacy notions like crowd-blending privacy. Overall, our measurements show that there does not exist a unique generic defense that can preserve the utility of the analytics for arbitrary applications, and provide useful insights regarding the disclosure of sanitized aggregate location time-series
    • …
    corecore