356 research outputs found

    Link Prediction by De-anonymization: How We Won the Kaggle Social Network Challenge

    Full text link
    This paper describes the winning entry to the IJCNN 2011 Social Network Challenge run by Kaggle.com. The goal of the contest was to promote research on real-world link prediction, and the dataset was a graph obtained by crawling the popular Flickr social photo sharing website, with user identities scrubbed. By de-anonymizing much of the competition test set using our own Flickr crawl, we were able to effectively game the competition. Our attack represents a new application of de-anonymization to gaming machine learning contests, suggesting changes in how future competitions should be run. We introduce a new simulated annealing-based weighted graph matching algorithm for the seeding step of de-anonymization. We also show how to combine de-anonymization with link prediction---the latter is required to achieve good performance on the portion of the test set not de-anonymized---for example by training the predictor on the de-anonymized portion of the test set, and combining probabilistic predictions from de-anonymization and link prediction.Comment: 11 pages, 13 figures; submitted to IJCNN'201

    De-anonymyzing scale-free social networks by using spectrum partitioning method

    Get PDF
    Social network data is widely shared, forwarded and published to third parties, which led to the risks of privacy disclosure. Even thought the network provider always perturbs the data before publishing it, attackers can still recover anonymous data according to the collected auxiliary information. In this paper, we transform the problem of de-anonymization into node matching problem in graph, and the de-anonymization method can reduce the number of nodes to be matched at each time. In addition, we use spectrum partitioning method to divide the social graph into disjoint subgraphs, and it can effectively be applied to large-scale social networks and executed in parallel by using multiple processors. Through the analysis of the influence of power-law distribution on de-anonymization, we synthetically consider the structural and personal information of users which made the feature information of the user more practical

    Vertical Federated Graph Neural Network for Recommender System

    Full text link
    Conventional recommender systems are required to train the recommendation model using a centralized database. However, due to data privacy concerns, this is often impractical when multi-parties are involved in recommender system training. Federated learning appears as an excellent solution to the data isolation and privacy problem. Recently, Graph neural network (GNN) is becoming a promising approach for federated recommender systems. However, a key challenge is to conduct embedding propagation while preserving the privacy of the graph structure. Few studies have been conducted on the federated GNN-based recommender system. Our study proposes the first vertical federated GNN-based recommender system, called VerFedGNN. We design a framework to transmit: (i) the summation of neighbor embeddings using random projection, and (ii) gradients of public parameter perturbed by ternary quantization mechanism. Empirical studies show that VerFedGNN has competitive prediction accuracy with existing privacy preserving GNN frameworks while enhanced privacy protection for users' interaction information.Comment: 17 pages, 9 figure

    Privacy and Anonymization of Neighborhoods in Multiplex Networks

    Get PDF
    Since the beginning of the digital age, the amount of available data on human behaviour has dramatically increased, along with the risk for the privacy of the represented subjects. Since the analysis of those data can bring advances to science, it is important to share them while preserving the subjects' anonymity. A significant portion of the available information can be modelled as networks, introducing an additional privacy risk related to the structure of the data themselves. For instance, in a social network, people can be uniquely identifiable because of the structure of their neighborhood, formed by the amount of their friends and the connections between them. The neighborhood's structure is the target of an identity disclosure attack on released social network data, called neighborhood attack. To mitigate this threat, algorithms to anonymize networks have been proposed. However, this problem has not been deeply studied on multiplex networks, which combine different social network data into a single representation. The multiplex network representation makes the neighborhood attack setting more complicated, and adds information that an attacker can use to re-identify subjects. This thesis aims to understand how multiplex networks behave in terms of anonymization difficulty and neighborhood attack. We present two definitions of multiplex neighborhoods, and discuss how the fraction of nodes with unique neighborhoods can be affected. Through analysis of network models, we study the variation of the uniqueness of neighborhoods in networks with different structure and characteristics. We show that the uniqueness of neighborhoods has a linear trend depending on the network size and average degree. If the network has a more random structure, the uniqueness decreases significantly when the network size increases. On the other hand, if the local structure is more pronounced, the uniqueness is not strongly influenced by the number of nodes. We also conduct a motif analysis to study the recurring patterns that can make social networks' neighborhoods less unique. Lastly, we propose an algorithm to anonymize a pair of multiplex neighborhoods. This algorithm is the core building block that can be used in a method to prevent neighborhood attacks on multiplex networks

    Privacy Preserving User Data Publication In Social Networks

    Get PDF
    Recent trends show that the popularity of Social Networks (SNs) has been increasing rapidly. From daily communication sites to online communities, an average person\u27s daily life has become dependent on these online networks. Additionally, the number of people using at least one of the social networks have increased drastically over the years. It is estimated that by the end of the year 2020, one-third of the world\u27s population will have social accounts. Hence, user privacy protection has gained wide acclaim in the research community. It has also become evident that protection should be provided to these networks from unwanted intruders. In this dissertation, we consider data privacy on online social networks at the network level and the user level. The network-level privacy helps us to prevent information leakage to third-party users like advertisers. To achieve such privacy, we propose various schemes that combine the privacy of all the elements of a social network: node, edge, and attribute privacy by clustering the users based on their attribute similarity. We combine the concepts of k-anonymity and l-diversity to achieve user privacy. To provide user-level privacy, we consider the scenario of mobile social networks as the user location privacy is the much-compromised problem. We provide a distributed solution where users in an area come together to achieve their desired privacy constraints. We also consider the mobility of the user and the network to provide much better results

    Privacy risk and de-anonymization in heterogeneous information networks

    Get PDF
    Anonymized user datasets are often released for research or industry applications. As an example, t.qq.com released its anonymized users’ profile, social interaction, and recommendation log data in KDD Cup 2012 to call for recommendation algorithms. Since the entities (users and so on) and edges (links among entities) are of multiple types, the released social network is a heterogeneous information network. Prior work has shown how privacy can be compromised in homogeneous information networks by the use of specific types of graph patterns. We show how the extra information derived from heterogeneity can be used to relax these assumptions. To characterize and demonstrate this added threat, we formally define privacy risk in an anonymized heterogeneous information network to identify the vulnerability in the possible way such data are released, and further present a new de-anonymization attack that exploits the vulnerability. Our attack successfully de-anonymized most individuals involved in the data. We further show that the general ideas of exploiting privacy risk and de-anonymizing heterogeneous information networks can be extended to more general graphs
    • …
    corecore