44 research outputs found
A Comprehensive Bibliometric Analysis on Social Network Anonymization: Current Approaches and Future Directions
In recent decades, social network anonymization has become a crucial research
field due to its pivotal role in preserving users' privacy. However, the high
diversity of approaches introduced in relevant studies poses a challenge to
gaining a profound understanding of the field. In response to this, the current
study presents an exhaustive and well-structured bibliometric analysis of the
social network anonymization field. To begin our research, related studies from
the period of 2007-2022 were collected from the Scopus Database then
pre-processed. Following this, the VOSviewer was used to visualize the network
of authors' keywords. Subsequently, extensive statistical and network analyses
were performed to identify the most prominent keywords and trending topics.
Additionally, the application of co-word analysis through SciMAT and the
Alluvial diagram allowed us to explore the themes of social network
anonymization and scrutinize their evolution over time. These analyses
culminated in an innovative taxonomy of the existing approaches and
anticipation of potential trends in this domain. To the best of our knowledge,
this is the first bibliometric analysis in the social network anonymization
field, which offers a deeper understanding of the current state and an
insightful roadmap for future research in this domain.Comment: 73 pages, 28 figure
Privacy and spectral analysis of social network randomization
Social networks are of significant importance in various application domains. Un- derstanding the general properties of real social networks has gained much attention due to the proliferation of networked data. Many applications of networks such as anonymous web browsing and data publishing require relationship anonymity due to the sensitive, stigmatizing, or confidential nature of the relationship. One general ap- proach for this problem is to randomize the edges in true networks, and only release the randomized networks for data analysis. Our research focuses on the development of randomization techniques such that the released networks can preserve data utility while preserving data privacy.
Data privacy refers to the sensitive information in the network data. The released network data after a simple randomization could incur various disclosures including identity disclosure, link disclosure and attribute disclosure. Data utility refers to the information, features, and patterns contained in the network data. Many important features may not be preserved in the released network data after a simple randomiza- tion. In this dissertation, we develop advanced randomization techniques to better preserve data utility of the network data while still preserving data privacy. Specifi- cally we develop two advanced randomization strategies that can preserve the spectral properties of the network or can preserve the real features (e.g., modularity) of the network. We quantify to what extent various randomization techniques can protect data privacy when attackers use different attacks or have different background knowl- edge. To measure the data utility, we also develop a consistent spectral framework to measure the non-randomness (importance) of the edges, nodes, and the overall graph. Exploiting the spectral space of network topology, we further develop fraud detection techniques for various collaborative attacks in social networks. Extensive theoretical analysis and empirical evaluations are conducted to demonstrate the efficacy of our developed techniques
Recommended from our members
Learning to de-anonymize social networks
Releasing anonymized social network data for analysis has been a popular idea among data providers. Despite evidence to the contrary the belief that anonymization will solve the privacy problem in practice refuses to die. This dissertation contributes to the field of social graph de-anonymization by demonstrating that even automated models can be quite successful in breaching the privacy of such datasets. We propose novel machine-learning based techniques to learn the identities of nodes in social graphs, thereby automating manual, heuristic-based attacks. Our work extends the vast literature of social graph de-anonymization attacks by systematizing them. We present a random-forests based classifier which uses structural node features based on neighborhood degree distribution to predict their similarity. Using these simple and efficient features we design versatile and expressive learning models which can learn the de-anonymization task just from a few examples. Our evaluation establishes their efficacy in transforming de-anonymization to a learning problem. The learning is transferable in that the model can be trained to attack one graph when trained on another. Moving on, we demonstrate the versatility and greater applicability of the proposed model by using it to solve the long-standing problem of benchmarking social graph anonymization schemes. Our framework bridges a fundamental research gap by making cheap, quick and automated analysis of anonymization schemes possible, without even requiring their full description. The benchmark is based on comparison of structural information leakage vs. utility preservation. We study the trade-off of anonymity vs. utility for six popular anonymization schemes including those promising k-anonymity. Our analysis shows that none of the schemes are fit for the purpose. Finally, we present an end-to-end social graph de-anonymization attack which uses the proposed machine learning techniques to recover node mappings across intersecting graphs. Our attack enhances the state of art in graph de-anonymization by demonstrating better performance than all the other attacks including those that use seed knowledge. The attack is seedless and heuristic free, which demonstrates the superiority of machine learning techniques as compared to hand-selected parametric attacks
Preventing active re-identification attacks on social graphs via sybil subgraph obfuscation
Active re-identification attacks constitute a serious threat to privacy-preserving social graph publication, because of the ability of active adversaries to leverage fake accounts, a.k.a. sybil nodes, to enforce structural patterns that can be used to re-identify their victims on anonymised graphs. Several formal privacy properties have been enunciated with the purpose of characterising the resistance of a graph against active attacks. However, anonymisation methods devised on the basis of these properties have so far been able to address only restricted special cases, where the adversaries are assumed to leverage a very small number of sybil nodes. In this paper, we present a new probabilistic interpretation of active re-identification attacks on social graphs. Unlike the aforementioned privacy properties, which model the protection from active adversaries as the task of making victim nodes indistinguishable in terms of their fingerprints with respect to all potential attackers, our new formulation introduces a more complete view, where the attack is countered by jointly preventing the attacker from retrieving the set of sybil nodes, and from using these sybil nodes for re-identifying the victims. Under the new formulation, we show that k-symmetry, a privacy property introduced in the context of passive attacks, provides a sufficient condition for the protection against active re-identification attacks leveraging an arbitrary number of sybil nodes. Moreover, we show that the algorithm K-Match, originally devised for efficiently enforcing the related notion of k-automorphism, also guarantees k-symmetry. Empirical results on real-life and synthetic graphs demonstrate that our formulation allows, for the first time, to publish anonymised social graphs (with formal privacy guarantees) that effectively resist the strongest active re-identification attack reported in the literature, even when it leverages a large number of sybil nodes
Privacy-preserving social network analysis
Data privacy in social networks is a growing concern that threatens to limit access to important information contained in these data structures. Analysis of the graph structure of social networks can provide valuable information for revenue generation and social science research, but unfortunately, ensuring this analysis does not violate individual privacy is difficult. Simply removing obvious identifiers from graphs or even releasing only aggregate results of analysis may not provide sufficient protection. Differential privacy is an alternative privacy model, popular in data-mining over tabular data, that uses noise to obscure individuals\u27 contributions to aggregate results and offers a strong mathematical guarantee that individuals\u27 presence in the data-set is hidden. Analyses that were previously vulnerable to identification of individuals and extraction of private data may be safely released under differential-privacy guarantees. However, existing adaptations of differential privacy to social network analysis are often complex and have considerable impact on the utility of the results, making it less likely that they will see widespread adoption in the social network analysis world. In fact, social scientists still often use the weakest form of privacy protection, simple anonymization, in their social network analysis publications. ^ We review the existing work in graph-privatization, including the two existing standards for adapting differential privacy to network data. We then proposecontributor-privacy and partition-privacy , novel standards for differential privacy over network data, and introduce simple, powerful private algorithms using these standards for common network analysis techniques that were infeasible to privatize under previous differential privacy standards. We also ensure that privatized social network analysis does not violate the level of rigor required in social science research, by proposing a method of determining statistical significance for paired samples under differential privacy using the Wilcoxon Signed-Rank Test, which is appropriate for non-normally distributed data. ^ Finally, we return to formally consider the case where differential privacy is not applied to data. Naive, deterministic approaches to privacy protection, including anonymization and aggregation of data, are often used in real world practice. De-anonymization research demonstrates that some naive approaches to privacy are highly vulnerable to reidentification attacks, and none of these approaches offer the robust guarantee of differential privacy. However, we propose that these methods fall across a range of protection: Some are better than others. In cases where adding noise to data is especially problematic, or acceptance and adoption of differential privacy is especially slow, it is critical to have a formal understanding of the alternatives. We define De Facto Privacy, a metric for comparing the relative privacy protection provided by deterministic approaches
Efficient Algorithms for Attributed Graph Alignment with Vanishing Edge Correlation
Graph alignment refers to the task of finding the vertex correspondence
between two positively correlated graphs. Extensive study has been done on
polynomial-time algorithms for the graph alignment problem under the
Erd\H{o}s--R\'enyi graph pair model, where the two graphs are
Erd\H{o}s--R\'enyi graphs with edge probability , correlated
under certain vertex correspondence. To achieve exact recovery of the vertex
correspondence, all existing algorithms at least require the edge correlation
coefficient between the two graphs to satisfy
, where is Otter's
tree-counting constant. Moreover, it is conjectured in [1] that no
polynomial-time algorithm can achieve exact recovery under weak edge
correlation .
In this paper, we show that with a vanishing amount of additional attribute
information, exact recovery is polynomial-time feasible under vanishing edge
correlation . We identify a local tree
structure, which incorporates one layer of user information and one layer of
attribute information, and apply the subgraph counting technique to such
structures. A polynomial-time algorithm is proposed that recovers the vertex
correspondence for all but a vanishing fraction of vertices. We then further
refine the algorithm output to achieve exact recovery. The motivation for
considering additional attribute information comes from the widely available
side information in real-world applications, such as the user's birthplace and
educational background on LinkedIn and Twitter social networks