143 research outputs found

    Privacy-preserving Transactions on the Web

    Get PDF
    There is a rapid growth in the number of applications using sensitive and personal information on the World Wide Web. This growth creates an urgent need to maintain the anonymity of the participants in many web transactions and to preserve the privacy of their sensitive data during data dissemination over the web. First, maintaining the anonymity of users on the World Wide Web is essential for a number of web applications. Anonymity cannot be assured by single interested individuals or an organization but requires participation from other web nodes owned by other entities. Second, preserving the privacy of sensitive data is another very important issue in web transactions. Today, exchanging and sharing personal data between various participants in web transactions endangers privacy. In this article, we discuss various research directions and challenges that need to be addressed while trying to accomplish our goal of maintaining the anonymity of participants and preserving the privacy of sensitive data in web transactions. To maintain anonymity of participants in a web transaction, we propose a method based on the modi fied form of the club mechanism with economic incentives, a solution which rests upon the Prisoner’s Dilemma approach. We compare our approach to other well-known dat a-sharing approaches such as Crowds, Tor, Tarzan and LPWA. To maintain the privacy of sensitive data, we propose a solution based on privacy-preserving data dissemination (P2D2). We also present a solution to implement our approach using Semantic Web Rule Languages and Jena—a Java-based inference engine

    Survey of Privacy-Preserving Data Publishing Methods and Speedy: a multi-threaded algorithm preserving k-anonymity

    Get PDF
    Στις μέρες μας, πολλοί οργανισμοί, επιχειρήσεις ή κρατικοί φορείς συλλέγουν και διαχειρίζονται μεγάλο όγκο προσωπικών πληροφοριών. Τυπικά παραδείγματα τέτοιων συνόλων δεδομένων περιλαμβάνουν κλινικές εξετάσεις νοσοκομείων, query logs μηχανών αναζήτησης, κοινωνικά δεδομένων προερχόμενα από δίκτυα κοινωνικής δικτύωσης, οικονομικά στοιχεία πληροφοριακών συστημάτων του δημοσίου κλπ. Αυτά τα σύνολα δεδομένων χρειάζεται συχνά να δημοσιευτούν για ερευνητικές ή στατιστικές μελέτες χωρίς να αποκαλυφθούν ευαίσθητα δεδομένα των ανθρώπων που περιλαμβάνουν. Η διαδικασία ανωνυμοποίησης είναι πιο περίπλοκη από την απλή απόκρυψη πεδίων που μπορούν άμεσα να προσδιορίσουν ένα άτομο (όνομα, AΦM κλπ). Ακόμα και χωρίς αυτά τα πεδία, ένας επιτιθέμενος μπορεί να προκαλέσει διαρροή ευαίσθητων πληροφοριών διασταυρώνοντας με άλλα δημόσια διαθέσιμα σύνολα δεδομένων ή έχοντας κάποιου είδους πρότερη γνώση. Επομένως, η διαφύλαξη της ιδιωτικότητας σε δεδομένα προς δημοσίευση έχει προσεγγίσει μεγάλο ενδιαφέρον τα τελευταία χρόνια με αρκετά μοντέλα ιδιωτικότητας να έχουν προταθεί στη βιβλιογραφία. Σε αυτή τη διπλωματική εργασία, αναλύουμε τις πιο συχνές επιθέσεις που μπορούν να γίνουν σε δημοσιευμένα σύνολα δεδομένων και παρουσιάζουμε τις πιο σύγχρονες εγγυήσεις ιδιωτικότητας και αλγορίθμους ανωνυμοποίησης για την αντιμετώπιση των επιθέσεων αυτών. Επιπλέον, προτείνουμε ένα νέο πολυνηματικό αλγόριθμο ανωνυμοποίησης που εκμεταλλεύεται τις δυνατότητες των σύγχρονων επεξεργαστών ώστε να επιταχυνθεί η διαδικασία ανωνυμοποίησης και να επιτευχθεί η k-ανωνυμία στο ανωνυμοποιημένο σύνολο δεδομένων.Nowadays, many organizations, enterprises or public services collect and manage a vast amount of personal information. Typical examples of such datasets include clinical tests conducted in hospitals, query logs held by search engines, social data produced by social networks, financial data from public sector information systems etc. These datasets often need to be published for research or statistical studies without revealing sensitive information of the individuals they describe. The anonymization process is more complicated than hiding attributes that can directly identify an individual (name, SSN etc.) from the published dataset. Even without these attributes an adversary can cause privacy leakage by cross-linking with other publicly available datasets or having some sort of background knowledge. Therefore, privacy preservation in data publishing has gained considerable attention during recent years with several privacy models proposed in the literature. In this thesis, we discuss the most common attacks that can be made on published datasets and we present state-of-the-art privacy guarantees and anonymization algorithms to counter these attacks. Furthermore, we propose a novel multi-threaded anonymization algorithm which exploits the capabilities of modern CPUs to speed up the anonymization process achieving k-anonymity in the anonymized dataset

    Privacy Preservation in High-dimensional Trajectory Data for Passenger Flow Analysis

    Get PDF
    The increasing use of location-aware devices provides many opportunities for analyzing and mining human mobility. The trajectory of a person can be represented as a sequence of visited locations with different timestamps. Storing, sharing, and analyzing personal trajectories may pose new privacy threats. Previous studies have shown that employing traditional privacy models and anonymization methods often leads to low information quality in the resulting data. In this thesis we propose a method for achieving anonymity in a trajectory database while preserving the information to support effective passenger flow analysis. Specifically, we first extract the passenger flowgraph, which is a commonly employed representation for modeling uncertain moving objects, from the raw trajectory data. We then anonymize the data with the goal of minimizing the impact on the flowgraph. Extensive experimental results on both synthetic and real-life data sets suggest that the framework is effective to overcome the special challenges in trajectory data anonymization, namely, high dimensionality, sparseness, and sequentiality

    ρ-uncertainty Anonymization by Partial Suppression

    Full text link
    Abstract. We present a novel framework for set-valued data anonymiza-tion by partial suppression regardless of the amount of background knowl-edge the attacker possesses, and can be adapted to both space-time and quality-time trade-offs in a “pay-as-you-go ” approach. While minimizing the number of item deletions, the framework attempts to either preserve the original data distribution or retain mineable useful association rules, which targets statistical analysis and association mining, two major data mining applications on set-valued data.

    Privacy-Preserving Design of Data Processing Systems in the Public Transport Context

    Get PDF
    The public transport network of a region inhabited by more than 4 million people is run by a complex interplay of public and private actors. Large amounts of data are generated by travellers, buying and using various forms of tickets and passes. Analysing the data is of paramount importance for the governance and sustainability of the system. This manuscript reports the early results of the privacy analysis which is being undertaken as part of the analysis of the clearing process in the Emilia-Romagna region, in Italy, which will compute the compensations for tickets bought from one operator and used with another. In the manuscript it is shown by means of examples that the clearing data may be used to violate various privacy aspects regarding users, as well as (technically equivalent) trade secrets regarding operators. The ensuing discussion has a twofold goal. First, it shows that after researching possible existing solutions, both by reviewing the literature on general privacy-preserving techniques, and by analysing similar scenarios that are being discussed in various cities across the world, the former are found exhibiting structural effectiveness deficiencies, while the latter are found of limited applicability, typically involving less demanding requirements. Second, it traces a research path towards a more effective approach to privacy-preserving data management in the specific context of public transport, both by refinement of current sanitization techniques and by application of the privacy by design approach. Available at: https://aisel.aisnet.org/pajais/vol7/iss4/4

    Publishing data from electronic health records while preserving privacy: a survey of algorithms

    Get PDF
    The dissemination of Electronic Health Records (EHRs) can be highly beneficial for a range of medical studies, spanning from clinical trials to epidemic control studies, but it must be performed in a way that preserves patients’ privacy. This is not straightforward, because the disseminated data need to be protected against several privacy threats, while remaining useful for subsequent analysis tasks. In this work, we present a survey of algorithms that have been proposed for publishing structured patient data, in a privacy-preserving way. We review more than 45 algorithms, derive insights on their operation, and highlight their advantages and disadvantages. We also provide a discussion of some promising directions for future research in this area

    What the Surprising Failure of Data Anonymization Means for Law and Policy

    Get PDF
    Paul Ohm is an Associate Professor of Law at the University of Colorado Law School. He writes in the areas of information privacy, computer crime law, intellectual property, and criminal procedure. Through his scholarship and outreach, Professor Ohm is leading efforts to build new interdisciplinary bridges between law and computer science. Before becoming a law professor, Professor Ohm served as a federal prosecutor for the U.S. Department of Justice in the computer crimes unit. Before law school, he worked as a computer programmer and network systems administrator

    Anonymizing large transaction data using MapReduce

    Get PDF
    Publishing transaction data is important to applications such as marketing research and biomedical studies. Privacy is a concern when publishing such data since they often contain person-specific sensitive information. To address this problem, different data anonymization methods have been proposed. These methods have focused on protecting the associated individuals from different types of privacy leaks as well as preserving utility of the original data. But all these methods are sequential and are designed to process data on a single machine, hence not scalable to large datasets. Recently, MapReduce has emerged as a highly scalable platform for data-intensive applications. In this work, we consider how MapReduce may be used to provide scalability in large transaction data anonymization. More specifically, we consider how setbased generalization methods such as RBAT (Rule-Based Anonymization of Transaction data) may be parallelized using MapReduce. Set-based generalization methods have some desirable features for transaction anonymization, but their highly iterative nature makes parallelization challenging. RBAT is a good representative of such methods. We propose a method for transaction data partitioning and representation. We also present two MapReduce-based parallelizations of RBAT. Our methods ensure scalability when the number of transaction records and domain of items are large. Our preliminary results show that a direct parallelization of RBAT by partitioning data alone can result in significant overhead, which can offset the gains from parallel processing. We propose MR-RBAT that generalizes our direct parallel method and allows to control parallelization overhead. Our experimental results show that MR-RBAT can scale linearly to large datasets and to the available resources while retaining good data utility
    corecore