22 research outputs found
Efficient Summing over Sliding Windows
This paper considers the problem of maintaining statistic aggregates over the
last W elements of a data stream. First, the problem of counting the number of
1's in the last W bits of a binary stream is considered. A lower bound of
{\Omega}(1/{\epsilon} + log W) memory bits for W{\epsilon}-additive
approximations is derived. This is followed by an algorithm whose memory
consumption is O(1/{\epsilon} + log W) bits, indicating that the algorithm is
optimal and that the bound is tight. Next, the more general problem of
maintaining a sum of the last W integers, each in the range of {0,1,...,R}, is
addressed. The paper shows that approximating the sum within an additive error
of RW{\epsilon} can also be done using {\Theta}(1/{\epsilon} + log W) bits for
{\epsilon}={\Omega}(1/W). For {\epsilon}=o(1/W), we present a succinct
algorithm which uses B(1 + o(1)) bits, where B={\Theta}(Wlog(1/W{\epsilon})) is
the derived lower bound. We show that all lower bounds generalize to randomized
algorithms as well. All algorithms process new elements and answer queries in
O(1) worst-case time.Comment: A shorter version appears in SWAT 201
k-anonymous Microdata Release via Post Randomisation Method
The problem of the release of anonymized microdata is an important topic in
the fields of statistical disclosure control (SDC) and privacy preserving data
publishing (PPDP), and yet it remains sufficiently unsolved. In these research
fields, k-anonymity has been widely studied as an anonymity notion for mainly
deterministic anonymization algorithms, and some probabilistic relaxations have
been developed. However, they are not sufficient due to their limitations,
i.e., being weaker than the original k-anonymity or requiring strong parametric
assumptions. First we propose Pk-anonymity, a new probabilistic k-anonymity,
and prove that Pk-anonymity is a mathematical extension of k-anonymity rather
than a relaxation. Furthermore, Pk-anonymity requires no parametric
assumptions. This property has a significant meaning in the viewpoint that it
enables us to compare privacy levels of probabilistic microdata release
algorithms with deterministic ones. Second, we apply Pk-anonymity to the post
randomization method (PRAM), which is an SDC algorithm based on randomization.
PRAM is proven to satisfy Pk-anonymity in a controlled way, i.e, one can
control PRAM's parameter so that Pk-anonymity is satisfied. On the other hand,
PRAM is also known to satisfy -differential privacy, a recent
popular and strong privacy notion. This fact means that our results
significantly enhance PRAM since it implies the satisfaction of both important
notions: k-anonymity and -differential privacy.Comment: 22 pages, 4 figure
Techniques d'anonymisation tabulaire : concepts et mise en oeuvre
International audienc
Probabilistic Dataset Reconstruction from Interpretable Models
Interpretability is often pointed out as a key requirement for trustworthy
machine learning. However, learning and releasing models that are inherently
interpretable leaks information regarding the underlying training data. As such
disclosure may directly conflict with privacy, a precise quantification of the
privacy impact of such breach is a fundamental problem. For instance, previous
work have shown that the structure of a decision tree can be leveraged to build
a probabilistic reconstruction of its training dataset, with the uncertainty of
the reconstruction being a relevant metric for the information leak. In this
paper, we propose of a novel framework generalizing these probabilistic
reconstructions in the sense that it can handle other forms of interpretable
models and more generic types of knowledge. In addition, we demonstrate that
under realistic assumptions regarding the interpretable models' structure, the
uncertainty of the reconstruction can be computed efficiently. Finally, we
illustrate the applicability of our approach on both decision trees and rule
lists, by comparing the theoretical information leak associated to either exact
or heuristic learning algorithms. Our results suggest that optimal
interpretable models are often more compact and leak less information regarding
their training data than greedily-built ones, for a given accuracy level
XYZ Privacy
Future autonomous vehicles will generate, collect, aggregate and consume
significant volumes of data as key gateway devices in emerging Internet of
Things scenarios. While vehicles are widely accepted as one of the most
challenging mobility contexts in which to achieve effective data
communications, less attention has been paid to the privacy of data emerging
from these vehicles. The quality and usability of such privatized data will lie
at the heart of future safe and efficient transportation solutions.
In this paper, we present the XYZ Privacy mechanism. XYZ Privacy is to our
knowledge the first such mechanism that enables data creators to submit
multiple contradictory responses to a query, whilst preserving utility measured
as the absolute error from the actual original data. The functionalities are
achieved in both a scalable and secure fashion. For instance, individual
location data can be obfuscated while preserving utility, thereby enabling the
scheme to transparently integrate with existing systems (e.g. Waze). A new
cryptographic primitive Function Secret Sharing is used to achieve
non-attributable writes and we show an order of magnitude improvement from the
default implementation.Comment: arXiv admin note: text overlap with arXiv:1708.0188