74 research outputs found

    Privacy Preserving Statistics

    Get PDF
    Over the past few years, there have been an increase in the development and improvement of circumvention tools like Tor and Psiphon. These tools provide an environment for citizens of oppressive regimes to access websites freely without fear of identification, these tools aid democracy activists and journalists in West Africa in using the Internet securely. A similar circumvention tool was developed by us. This tool circumvents DNS and IP address blocking/filtering, by leveraging technologies developed by criminal botnet enterprises. To improve and maintain the circumvention tool we developed, it is important to quantify the number and country of origin of users. System statistics are used to give feedback to the US State Department, who funded this project. We need to show them that target users are taking advantage of the developed system. Considering that the system helps provide anonymity to users as well as bypassing DNS and IP filtering, and system users have a high demand for privacy, we must not collect sensitive user information. We therefore develop statistics that aim to not compromise user anonymity. Two probabilistic data structures are introduced, evaluated, improved upon and used, to keep system statistics without compromising user privacy. The first data structure is the negative survey. Using negative survey we can keep an aggregate count of user countries of origin without knowing the country of origin of any individual session by asking the user to report a country that they do not belong to. Negative survey allows us to calculate how many accesses there have been from each country, while keeping insensitive user information The second data structure is a probabilistic counting algorithm which, without keeping a list of already encountered data, like IP addresses, estimates the number of distinct elements in a large collection of data. We use hash values to obtain the number of unique users of the system. This algorithm is based on statistical observations made on bits of hashed values of records. Our records contain the hash values the users ssl certificates. We store the least significant bit that was set in the ssl certificate hash. From the bit position of the lowest bit that is not set, we get a good estimate of the number of system users. We contribute to this technique by considering when the number of collisions of the hash values will affect the estimate and use this amount to give a better estimate. This also allows us to decide on-line the proper register size to maintai

    PPS: Privacy-preserving statistics using RFID tags

    Get PDF
    As RFID applications are entering our daily life, many new security and privacy challenges arise. However, current research in RFID security focuses mainly on simple authentication and privacy-preserving identication. In this paper, we discuss the possibility of widening the scope of RFID security and privacy by introducing a new application scenario. The suggested application consists of computing statistics on private properties of individuals stored in RFID tags. The main requirement is to compute global statistics while preserving the privacy of individual readings. PPS assures the privacy of properties stored in each tag through the combination of homomorphic encryption and aggregation at the readers. Re-encryption is used to prevent tracking of users. The readers scan tags and forward the aggregate of their encrypted readings to the back-end server. The back-end server then decrypts the aggregates it receives and updates the global statistics accordingly. PPS is provably privacypreserving. Moreover, tags can be very simple since they are not required to perform any kind of computation, but only to store data

    Privacy-Preserving Verification of Clinical Research

    Get PDF
    We treat the problem of privacy-preserving statistics verification in clinical research. We show that given aggregated results from statistical calculations, we can verify their correctness efficiently, without revealing any of the private inputs used for the calculation. Our construction is based on the primitive of Secure Multi-Party Computation from Shamir's Secret Sharing. Basically, our setting involves three parties: a hospital, which owns the private inputs, a clinical researcher, who lawfully processes the sensitive data to produce an aggregated statistical result, and a third party (usually several verifiers) assigned to verify this result for reliability and transparency reasons. Our solution guarantees that these verifiers only learn about the aggregated results (and what can be inferred from those about the underlying private data) and nothing more. By taking advantage of the particular scenario at hand (where certain intermediate results, e.g., the mean over the dataset, are available in the clear) and utilizing secret sharing primitives, our approach turns out to be practically efficient, which we underpin by performing several experiments on real patient data. Our results show that the privacy-preserving verification of the most commonly used statistical operations in clinical research presents itself as an important use case, where the concept of secure multi-party computation becomes employable in practice

    Privacy-Friendly Mobility Analytics using Aggregate Location Data

    Get PDF
    Location data can be extremely useful to study commuting patterns and disruptions, as well as to predict real-time traffic volumes. At the same time, however, the fine-grained collection of user locations raises serious privacy concerns, as this can reveal sensitive information about the users, such as, life style, political and religious inclinations, or even identities. In this paper, we study the feasibility of crowd-sourced mobility analytics over aggregate location information: users periodically report their location, using a privacy-preserving aggregation protocol, so that the server can only recover aggregates -- i.e., how many, but not which, users are in a region at a given time. We experiment with real-world mobility datasets obtained from the Transport For London authority and the San Francisco Cabs network, and present a novel methodology based on time series modeling that is geared to forecast traffic volumes in regions of interest and to detect mobility anomalies in them. In the presence of anomalies, we also make enhanced traffic volume predictions by feeding our model with additional information from correlated regions. Finally, we present and evaluate a mobile app prototype, called Mobility Data Donors (MDD), in terms of computation, communication, and energy overhead, demonstrating the real-world deployability of our techniques.Comment: Published at ACM SIGSPATIAL 201

    Distributed Private Heavy Hitters

    Full text link
    In this paper, we give efficient algorithms and lower bounds for solving the heavy hitters problem while preserving differential privacy in the fully distributed local model. In this model, there are n parties, each of which possesses a single element from a universe of size N. The heavy hitters problem is to find the identity of the most common element shared amongst the n parties. In the local model, there is no trusted database administrator, and so the algorithm must interact with each of the nn parties separately, using a differentially private protocol. We give tight information-theoretic upper and lower bounds on the accuracy to which this problem can be solved in the local model (giving a separation between the local model and the more common centralized model of privacy), as well as computationally efficient algorithms even in the case where the data universe N may be exponentially large
    • …
    corecore