11 research outputs found

    Hiding mobile traffic fingerprints with GLOVE

    Get PDF
    Proceeding of: 11th ACM Conference on Emerging Networking Experiments and Technologies ACM (CoNEXT 2015), Heidelberg, Germany, 1-4 December 2015Preservation of user privacy is paramount in the publication of datasets that contain fine-grained information about individuals. The problem is especially critical in the case of mobile traffic datasets collected by cellular operators, as they feature high subscriber trajectory uniqueness and they are resistant to anonymization through spatiotemporal generalization. In this work, we first unveil the reasons behind such undesirable features of mobile traffic datasets, by leveraging an original measure of the anonymizability of users' mobile fingerprints. Building on such findings, we propose GLOVE, an algorithm that grants k-anonymity of trajectories through specialized generalization. We evaluate our methodology on two nationwide mobile traffic datasets, and show that it achieves k-anonymity while preserving a substantial level of accuracy in the data.This work was supported by the French National Research Agency under grant ANR-13-INFR-0005 ABCD and by the EU FP7 ERA-NET program under grant CHIST-ERA-2012 MACACO

    GLOVE: towards privacy-preserving publishing of record-level-truthful mobile phone trajectories

    Get PDF
    Datasets of mobile phone trajectories collected by network operators offer an unprecedented opportunity to discover new knowledge from the activity of large populations of millions. However, publishing such trajectories also raises significant privacy concerns, as they contain personal data in the form of individual movement patterns. Privacy risks induce network operators to enforce restrictive confidential agreements in the rare occasions when they grant access to collected trajectories, whereas a less involved circulation of these data would fuel research and enable reproducibility in many disciplines. In this work, we contribute a building block toward the design of privacy-preserving datasets of mobile phone trajectories that are truthful at the record level. We present GLOVE, an algorithm that implements k-anonymity, hence solving the crucial unicity problem that affects this type of data while ensuring that the anonymized trajectories correspond to real-life users. GLOVE builds on original insights about the root causes behind the undesirable unicity of mobile phone trajectories, and leverages generalization and suppression to remove them. Proof-of-concept validations with large-scale real-world datasets demonstrate that the approach adopted by GLOVE allows preserving a substantial level of accuracy in the data, higher than that granted by previous methodologies.This work was supported by the Atracción de Talento Investigador program of the Comunidad de Madrid under Grant No. 2019-T1/TIC-16037 NetSense

    Large-scale Mobile Traffic Analysis: a Survey

    Get PDF
    International audienceThis article surveys the literature on analyses of mobile traffic collected by operators within their network infrastructure. This is a recently emerged research field, and, apart from a few outliers, relevant works cover the period from 2005 to date, with a sensible densification over the last three years. We provide a thorough review of the multidisciplinary activities that rely on mobile traffic datasets, identifying major categories and sub-categories in the literature, so as to outline a hierarchical classification of research lines. When detailing the works pertaining to each class, we balance a comprehensive view of state-of-the-art results with punctual focuses on the methodological aspects. Our approach provides a complete introductory guide to the research based on mobile traffic analysis. It allows summarizing the main findings of the current state-of-the-art, as well as pinpointing important open research directions

    The impact of technology on data collection: Case studies in privacy and economics

    Get PDF
    Technological advancement can often act as a catalyst for scientific paradigm shifts. Today the ability to collect and process large amounts of data about individuals is arguably a paradigm-shift enabling technology in action. One manifestation of this technology within the sciences is the ability to study historically qualitative fields with a more granular quantitative lens than ever before. Despite the potential for this technology, wide-adoption is accompanied by some risks. In this thesis, I will present two case studies. The first, focuses on the impact of machine learning in a cheapest-wins motor insurance market by designing a competition-based data collection mechanism. Pricing models in the insurance industry are changing from statistical methods to machine learning. In this game, close to 2000 participants, acting as insurance companies, trained and submitted pricing models to compete for profit using real motor insurance policies --- with a roughly equal split between legacy and advanced models. With this trend towards machine learning in motion, preliminary analysis of the results suggest that future markets might realise cheaper prices for consumers. Additionally legacy models competing against modern algorithms, may experience a reduction in earning stability --- accelerating machine learning adoption. Overall, the results of this field experiment demonstrate the potential for digital competition-based studies of markets in the future. The second case studies the privacy risks of data collection technologies. Despite a large body of research in re-identification of anonymous data, the question remains: if a dataset was big enough, would records become anonymous by being "lost in the crowd"? Using 3 months of location data, we show that the risk of re-identification decreases slowly with dataset size. This risk is modelled and extrapolated to larger populations with 93% of people being uniquely identifiable using 4 points of auxiliary information among 60M people. These results show how the privacy of individuals is very unlikely to be preserved even in country-scale location datasets and that alternative paradigms of data sharing are still required.Open Acces
    corecore