271 research outputs found

    A Comprehensive Bibliometric Analysis on Social Network Anonymization: Current Approaches and Future Directions

    Full text link
    In recent decades, social network anonymization has become a crucial research field due to its pivotal role in preserving users' privacy. However, the high diversity of approaches introduced in relevant studies poses a challenge to gaining a profound understanding of the field. In response to this, the current study presents an exhaustive and well-structured bibliometric analysis of the social network anonymization field. To begin our research, related studies from the period of 2007-2022 were collected from the Scopus Database then pre-processed. Following this, the VOSviewer was used to visualize the network of authors' keywords. Subsequently, extensive statistical and network analyses were performed to identify the most prominent keywords and trending topics. Additionally, the application of co-word analysis through SciMAT and the Alluvial diagram allowed us to explore the themes of social network anonymization and scrutinize their evolution over time. These analyses culminated in an innovative taxonomy of the existing approaches and anticipation of potential trends in this domain. To the best of our knowledge, this is the first bibliometric analysis in the social network anonymization field, which offers a deeper understanding of the current state and an insightful roadmap for future research in this domain.Comment: 73 pages, 28 figure

    Injecting Uncertainty in Graphs for Identity Obfuscation

    Full text link
    Data collected nowadays by social-networking applications create fascinating opportunities for building novel services, as well as expanding our understanding about social structures and their dynamics. Unfortunately, publishing social-network graphs is considered an ill-advised practice due to privacy concerns. To alleviate this problem, several anonymization methods have been proposed, aiming at reducing the risk of a privacy breach on the published data, while still allowing to analyze them and draw relevant conclusions. In this paper we introduce a new anonymization approach that is based on injecting uncertainty in social graphs and publishing the resulting uncertain graphs. While existing approaches obfuscate graph data by adding or removing edges entirely, we propose using a finer-grained perturbation that adds or removes edges partially: this way we can achieve the same desired level of obfuscation with smaller changes in the data, thus maintaining higher utility. Our experiments on real-world networks confirm that at the same level of identity obfuscation our method provides higher usefulness than existing randomized methods that publish standard graphs.Comment: VLDB201

    Using Metrics Suites to Improve the Measurement of Privacy in Graphs

    Get PDF
    The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.Social graphs are widely used in research (e.g., epidemiology) and business (e.g., recommender systems). However, sharing these graphs poses privacy risks because they contain sensitive information about individuals. Graph anonymization techniques aim to protect individual users in a graph, while graph de-anonymization aims to re-identify users. The effectiveness of anonymization and de-anonymization algorithms is usually evaluated with privacy metrics. However, it is unclear how strong existing privacy metrics are when they are used in graph privacy. In this paper, we study 26 privacy metrics for graph anonymization and de-anonymization and evaluate their strength in terms of three criteria: monotonicity indicates whether the metric indicates lower privacy for stronger adversaries; for within-scenario comparisons, evenness indicates whether metric values are spread evenly; and for between-scenario comparisons, shared value range indicates whether metrics use a consistent value range across scenarios. Our extensive experiments indicate that no single metric fulfills all three criteria perfectly. We therefore use methods from multi-criteria decision analysis to aggregate multiple metrics in a metrics suite, and we show that these metrics suites improve monotonicity compared to the best individual metric. This important result enables more monotonic, and thus more accurate, evaluations of new graph anonymization and de-anonymization algorithms

    Multimodal Network Alignment

    Full text link
    A multimodal network encodes relationships between the same set of nodes in multiple settings, and network alignment is a powerful tool for transferring information and insight between a pair of networks. We propose a method for multimodal network alignment that computes a matrix which indicates the alignment, but produces the result as a low-rank factorization directly. We then propose new methods to compute approximate maximum weight matchings of low-rank matrices to produce an alignment. We evaluate our approach by applying it on synthetic networks and use it to de-anonymize a multimodal transportation network.Comment: 14 pages, 6 figures, Siam Data Mining 201

    Utility-Based Privacy Preserving Data Publishing

    Get PDF
    Advances in data collection techniques and need for automation triggered in proliferation of a huge amount of data. This exponential increase in the collection of personal information has for some time represented a serious threat to privacy. With the advancement of technologies for data storage, data mining, machine learning, social networking and cloud computing, the problem is further fueled. Privacy is a fundamental right of every human being and needs to be preserved. As a counterbalance to the socio-technical transformations, most nations have both general policies on preserving privacy and specic legislation to control access to and use of data. Privacy preserving data publishing is the ability to control the dissemination and use of one's personal information. Mere publishing (or sharing) of original data in raw form results in identity disclosure with linkage attacks. To overcome linkage attacks, the techniques of statistical disclosure control are employed. One such approach is k-anonymity that reduce data across a set of key variables to a set of classes. In a k-anonymized dataset each record is indistinguishable from at least k-1 others, meaning that an attacker cannot link the data records to population units with certainty thus reducing the probability of disclosure. Algorithms that have been proposed to enforce k-anonymity are Samarati's algorithm and Sweeney's Datafly algorithm. Both of these algorithms adhere to full domain generalization with global recording. These methods have a tradeo between utility, computing time and information loss. A good privacy preserving technique should ensure a balance of utility and privacy, giving good performance and level of uncertainty. In this thesis, we propose an improved greedy heuristic that maintains a balance between utility, privacy, computing time and information loss. Given a dataset and k, constructing the dataset to k-anonymous dataset can be done by the above-mentioned schemes. One of the challenges is to nd the best value of k, when the dataset is provided. In this thesis, a scheme has been proposed to achieve the best value of k for a given dataset. The k-anonymity scheme suers from homogeneity attack. As a result, the l-diverse scheme was developed. It states that the diversity of domain values of the dataset in an equivalence class should be l. The l-diversity scheme suers from background knowledge attack. To address this problem, t-closeness scheme was proposed. The t-closeness principle states that the distribution of records in an equivalence class and the distribution of records in the table should not exceed more than t. The drawback with this scheme is that, the distance metric deployed in constructing a table, satisfying t-closeness, does not follow the distance characteristics. In this thesis, we have deployed an alternative distance metric namely, Hellinger metric, for constructing a t-closeness table. The t-closeness scheme with this alternative distance metric performed better with respect to the discernability metric and computing time. The k-anonymity, l-diversity and t-closeness schemes can be used to anonymize the dataset before publishing (releasing or sharing). This is generally in a static environment. There are also data that need to be published in a dynamic environment. One such example is a social network. Anonymizing social networks poses great challenges. Solutions suggested till date do not consider utility of the data while anonymizing. In this thesis, we propose a novel scheme to anonymize the users depending on their importance and take utility into consideration. Importance of a node was decided by the centrality and prestige measures. Hence, the utility and privacy of the users are balanced

    Differentially Private Exponential Random Graphs

    Full text link
    We propose methods to release and analyze synthetic graphs in order to protect privacy of individual relationships captured by the social network. Proposed techniques aim at fitting and estimating a wide class of exponential random graph models (ERGMs) in a differentially private manner, and thus offer rigorous privacy guarantees. More specifically, we use the randomized response mechanism to release networks under ϵ\epsilon-edge differential privacy. To maintain utility for statistical inference, treating the original graph as missing, we propose a way to use likelihood based inference and Markov chain Monte Carlo (MCMC) techniques to fit ERGMs to the produced synthetic networks. We demonstrate the usefulness of the proposed techniques on a real data example.Comment: minor edit
    corecore