921 research outputs found

    Link Prediction by De-anonymization: How We Won the Kaggle Social Network Challenge

    Full text link
    This paper describes the winning entry to the IJCNN 2011 Social Network Challenge run by Kaggle.com. The goal of the contest was to promote research on real-world link prediction, and the dataset was a graph obtained by crawling the popular Flickr social photo sharing website, with user identities scrubbed. By de-anonymizing much of the competition test set using our own Flickr crawl, we were able to effectively game the competition. Our attack represents a new application of de-anonymization to gaming machine learning contests, suggesting changes in how future competitions should be run. We introduce a new simulated annealing-based weighted graph matching algorithm for the seeding step of de-anonymization. We also show how to combine de-anonymization with link prediction---the latter is required to achieve good performance on the portion of the test set not de-anonymized---for example by training the predictor on the de-anonymized portion of the test set, and combining probabilistic predictions from de-anonymization and link prediction.Comment: 11 pages, 13 figures; submitted to IJCNN'201

    A Comprehensive Bibliometric Analysis on Social Network Anonymization: Current Approaches and Future Directions

    Full text link
    In recent decades, social network anonymization has become a crucial research field due to its pivotal role in preserving users' privacy. However, the high diversity of approaches introduced in relevant studies poses a challenge to gaining a profound understanding of the field. In response to this, the current study presents an exhaustive and well-structured bibliometric analysis of the social network anonymization field. To begin our research, related studies from the period of 2007-2022 were collected from the Scopus Database then pre-processed. Following this, the VOSviewer was used to visualize the network of authors' keywords. Subsequently, extensive statistical and network analyses were performed to identify the most prominent keywords and trending topics. Additionally, the application of co-word analysis through SciMAT and the Alluvial diagram allowed us to explore the themes of social network anonymization and scrutinize their evolution over time. These analyses culminated in an innovative taxonomy of the existing approaches and anticipation of potential trends in this domain. To the best of our knowledge, this is the first bibliometric analysis in the social network anonymization field, which offers a deeper understanding of the current state and an insightful roadmap for future research in this domain.Comment: 73 pages, 28 figure

    A Genetic Programming Framework for Two Data Mining Tasks: Classification and Generalized Rule Induction

    Get PDF
    This paper proposes a genetic programming (GP) framework for two major data mining tasks, namely classification and generalized rule induction. The framework emphasizes the integration between a GP algorithm and relational database systems. In particular, the fitness of individuals is computed by submitting SQL queries to a (parallel) database server. Some advantages of this integration from a data mining viewpoint are scalability, data-privacy control and automatic parallelization

    Location Privacy in the Era of Big Data and Machine Learning

    Get PDF
    Location data of individuals is one of the most sensitive sources of information that once revealed to ill-intended individuals or service providers, can cause severe privacy concerns. In this thesis, we aim at preserving the privacy of users in telecommunication networks against untrusted service providers as well as improving their privacy in the publication of location datasets. For improving the location privacy of users in telecommunication networks, we consider the movement of users in trajectories and investigate the threats that the query history may pose on location privacy. We develop an attack model based on the Viterbi algorithm termed as Viterbi attack, which represents a realistic privacy threat in trajectories. Next, we propose a metric called transition entropy that helps to evaluate the performance of dummy generation algorithms, followed by developing a robust dummy generation algorithm that can defend users against the Viterbi attack. We compare and evaluate our proposed algorithm and metric on a publicly available dataset published by Microsoft, i.e., Geolife dataset. For privacy preserving data publishing, an enhanced framework for anonymization of spatio-temporal trajectory datasets termed the machine learning based anonymization (MLA) is proposed. The framework consists of a robust alignment technique and a machine learning approach for clustering datasets. The framework and all the proposed algorithms are applied to the Geolife dataset, which includes GPS logs of over 180 users in Beijing, China
    • …
    corecore