74 research outputs found
Privacy Preserving Statistics
Over the past few years, there have been an increase in the development and improvement of circumvention tools like Tor and Psiphon. These tools provide an environment for citizens of oppressive regimes to access websites freely without fear of identification, these tools aid democracy activists and journalists in West Africa in using the Internet securely. A similar circumvention tool was developed by us. This tool circumvents DNS and IP address blocking/filtering, by leveraging technologies developed by criminal botnet enterprises. To improve and maintain the circumvention tool we developed, it is important to quantify the number and country of origin of users. System statistics are used to give feedback to the US State Department, who funded this project. We need to show them that target users are taking advantage of the developed system. Considering that the system helps provide anonymity to users as well as bypassing DNS and IP filtering, and system users have a high demand for privacy, we must not collect sensitive user information. We therefore develop statistics that aim to not compromise user anonymity. Two probabilistic data structures are introduced, evaluated, improved upon and used, to keep system statistics without compromising user privacy. The first data structure is the negative survey. Using negative survey we can keep an aggregate count of user countries of origin without knowing the country of origin of any individual session by asking the user to report a country that they do not belong to. Negative survey allows us to calculate how many accesses there have been from each country, while keeping insensitive user information The second data structure is a probabilistic counting algorithm which, without keeping a list of already encountered data, like IP addresses, estimates the number of distinct elements in a large collection of data. We use hash values to obtain the number of unique users of the system. This algorithm is based on statistical observations made on bits of hashed values of records. Our records contain the hash values the users ssl certificates. We store the least significant bit that was set in the ssl certificate hash. From the bit position of the lowest bit that is not set, we get a good estimate of the number of system users. We contribute to this technique by considering when the number of collisions of the hash values will affect the estimate and use this amount to give a better estimate. This also allows us to decide on-line the proper register size to maintai
PPS: Privacy-preserving statistics using RFID tags
As RFID applications are entering our daily life, many new
security and privacy challenges arise. However, current research
in RFID security focuses mainly on simple authentication
and privacy-preserving identication. In this paper,
we discuss the possibility of widening the scope of RFID
security and privacy by introducing a new application scenario.
The suggested application consists of computing statistics
on private properties of individuals stored in RFID tags.
The main requirement is to compute global statistics while
preserving the privacy of individual readings. PPS assures
the privacy of properties stored in each tag through the combination
of homomorphic encryption and aggregation at the
readers. Re-encryption is used to prevent tracking of users.
The readers scan tags and forward the aggregate of their
encrypted readings to the back-end server. The back-end
server then decrypts the aggregates it receives and updates
the global statistics accordingly. PPS is provably privacypreserving.
Moreover, tags can be very simple since they are
not required to perform any kind of computation, but only
to store data
Privacy-Preserving Verification of Clinical Research
We treat the problem of privacy-preserving statistics verification in clinical research. We show that given aggregated results from statistical calculations, we can verify their correctness efficiently, without revealing any of the private inputs used for the calculation. Our construction is based on the primitive of Secure Multi-Party Computation from Shamir's Secret Sharing. Basically, our setting involves three parties: a hospital, which owns the private inputs, a clinical researcher, who lawfully processes the sensitive data to produce an aggregated statistical result, and a third party (usually several verifiers) assigned to verify this result for reliability and transparency reasons. Our solution guarantees that these verifiers only learn about the aggregated results (and what can be inferred from those about the underlying private data) and nothing more. By taking advantage of the particular scenario at hand (where certain intermediate results, e.g., the mean over the dataset, are available in the clear) and utilizing secret sharing primitives, our approach turns out to be practically efficient, which we underpin by performing several experiments on real patient data. Our results show that the privacy-preserving verification of the most commonly used statistical operations in clinical research presents itself as an important use case, where the concept of secure multi-party computation becomes employable in practice
Privacy-Friendly Mobility Analytics using Aggregate Location Data
Location data can be extremely useful to study commuting patterns and
disruptions, as well as to predict real-time traffic volumes. At the same time,
however, the fine-grained collection of user locations raises serious privacy
concerns, as this can reveal sensitive information about the users, such as,
life style, political and religious inclinations, or even identities. In this
paper, we study the feasibility of crowd-sourced mobility analytics over
aggregate location information: users periodically report their location, using
a privacy-preserving aggregation protocol, so that the server can only recover
aggregates -- i.e., how many, but not which, users are in a region at a given
time. We experiment with real-world mobility datasets obtained from the
Transport For London authority and the San Francisco Cabs network, and present
a novel methodology based on time series modeling that is geared to forecast
traffic volumes in regions of interest and to detect mobility anomalies in
them. In the presence of anomalies, we also make enhanced traffic volume
predictions by feeding our model with additional information from correlated
regions. Finally, we present and evaluate a mobile app prototype, called
Mobility Data Donors (MDD), in terms of computation, communication, and energy
overhead, demonstrating the real-world deployability of our techniques.Comment: Published at ACM SIGSPATIAL 201
Distributed Private Heavy Hitters
In this paper, we give efficient algorithms and lower bounds for solving the
heavy hitters problem while preserving differential privacy in the fully
distributed local model. In this model, there are n parties, each of which
possesses a single element from a universe of size N. The heavy hitters problem
is to find the identity of the most common element shared amongst the n
parties. In the local model, there is no trusted database administrator, and so
the algorithm must interact with each of the parties separately, using a
differentially private protocol. We give tight information-theoretic upper and
lower bounds on the accuracy to which this problem can be solved in the local
model (giving a separation between the local model and the more common
centralized model of privacy), as well as computationally efficient algorithms
even in the case where the data universe N may be exponentially large
- …