86 research outputs found
Verification in Privacy Preserving Data Publishing
Privacy preserving data publication is a major concern for both the owners of data and the data publishers. Principles like k-anonymity, l-diversity were proposed to reduce privacy violations. On the other side, no studies were found on verification on the anonymized data in terms of adversarial breach and anonymity levels. However, the anonymized data is still prone to attacks due to the presence of dependencies among quasi-identifiers and sensitive attributes. This paper presents a novel framework to detect the existence of those dependencies and a solution to reduce them. The advantages of our approach are i) privacy violations can be detected, ii) the extent of privacy risk can be measured and iii) re-anonymization can be done on vulnerable blocks of data. The work is further extended to show how the adversarial breach knowledge eventually increased when new tuples are added and an on the fly solution to reduce it is discussed. Experimental results are reported and analyzed
Datalog Unchained
International audienceThis is the companion paper of a talk in the Gems of PODS series, that reviews the development, starting at PODS 1988, of a family of Datalog-like languages with procedural, forward chaining semantics, providing an alternative to the classical declarative, model-theoretic semantics. These languages also provide a unified formalism that can express important classes of queries including fixpoint, while, and all computable queries. They can also incorporate in a natural fashion updates and nondeterminism. Datalog variants with forward chaining semantics have been adopted in a variety of settings, including active databases, production systems, distributed data exchange, and data-driven reactive systems
Data Sharing Fundamentals: Definition and Characteristics
The importance of data as a key resource is a universal theme dominating social and business life. In this regard, inter-organizational data sharing shines in a new light prompting businesses to leverage their potential. However, it is still unclear what data sharing actually entails, i.e., what it means, what its potentials are, and what barriers one must overcome. In short, it lacks conceptual clarity and a clear description of its characteristics. The conceptual ambiguity and the synonymous use with data exchange in the literature are particularly problematic, which prevents a targeted conceptualization and use. The paper starts precisely at this point as it proposes a unifying definition and characteristics of data sharing. We report on a systematic literature review characterizing data sharing and delineating it from data exchange
Small Circuits Imply Efficient Arthur-Merlin Protocols
The inner product function ? x,y ? = ?_i x_i y_i mod 2 can be easily computed by a (linear-size) AC?(?) circuit: that is, a constant depth circuit with AND, OR and parity (XOR) gates. But what if we impose the restriction that the parity gates can only be on the bottom most layer (closest to the input)? Namely, can the inner product function be computed by an AC? circuit composed with a single layer of parity gates? This seemingly simple question is an important open question at the frontier of circuit lower bound research.
In this work, we focus on a minimalistic version of the above question. Namely, whether the inner product function cannot be approximated by a small DNF augmented with a single layer of parity gates. Our main result shows that the existence of such a circuit would have unexpected implications for interactive proofs, or more specifically, for interactive variants of the Data Streaming and Communication Complexity models. In particular, we show that the existence of such a small (i.e., polynomial-size) circuit yields:
1) An O(d)-message protocol in the Arthur-Merlin Data Streaming model for every n-variate, degree d polynomial (over GF(2)), using only O?(d) ?log(n) communication and space complexity. In particular, this gives an AM[2] Data Streaming protocol for a variant of the well-studied triangle counting problem, with poly-logarithmic communication and space complexities.
2) A 2-message communication complexity protocol for any sparse (or low degree) polynomial, and for any function computable by an AC?(?) circuit. Specifically, for the latter, we obtain a protocol with communication complexity that is poly-logarithmic in the size of the AC?(?) circuit
Triangle Counting in Dynamic Graph Streams
Estimating the number of triangles in graph streams using a limited amount of
memory has become a popular topic in the last decade. Different variations of
the problem have been studied, depending on whether the graph edges are
provided in an arbitrary order or as incidence lists. However, with a few
exceptions, the algorithms have considered {\em insert-only} streams. We
present a new algorithm estimating the number of triangles in {\em dynamic}
graph streams where edges can be both inserted and deleted. We show that our
algorithm achieves better time and space complexity than previous solutions for
various graph classes, for example sparse graphs with a relatively small number
of triangles. Also, for graphs with constant transitivity coefficient, a common
situation in real graphs, this is the first algorithm achieving constant
processing time per edge. The result is achieved by a novel approach combining
sampling of vertex triples and sparsification of the input graph. In the course
of the analysis of the algorithm we present a lower bound on the number of
pairwise independent 2-paths in general graphs which might be of independent
interest. At the end of the paper we discuss lower bounds on the space
complexity of triangle counting algorithms that make no assumptions on the
structure of the graph.Comment: New version of a SWAT 2014 paper with improved result
Improved Algorithms for White-Box Adversarial Streams
We study streaming algorithms in the white-box adversarial stream model,
where the internal state of the streaming algorithm is revealed to an adversary
who adaptively generates the stream updates, but the algorithm obtains fresh
randomness unknown to the adversary at each time step. We incorporate
cryptographic assumptions to construct robust algorithms against such
adversaries. We propose efficient algorithms for sparse recovery of vectors,
low rank recovery of matrices and tensors, as well as low rank plus sparse
recovery of matrices, i.e., robust PCA. Unlike deterministic algorithms, our
algorithms can report when the input is not sparse or low rank even in the
presence of such an adversary. We use these recovery algorithms to improve upon
and solve new problems in numerical linear algebra and combinatorial
optimization on white-box adversarial streams. For example, we give the first
efficient algorithm for outputting a matching in a graph with insertions and
deletions to its edges provided the matching size is small, and otherwise we
declare the matching size is large. We also improve the approximation versus
memory tradeoff of previous work for estimating the number of non-zero elements
in a vector and computing the matrix rank.Comment: ICML 202
Planar Matching in Streams Revisited
We present data stream algorithms for estimating the size or weight of the maximum matching in low arboricity graphs. A large body of work has focused on improving the constant approximation factor for general graphs when the data stream algorithm is permitted O(n polylog n) space where n is the number of nodes. This space is necessary if the algorithm must return the matching. Recently, Esfandiari et al. (SODA 2015) showed that it was possible to estimate the maximum cardinality of a matching in a planar graph up to a factor of 24+epsilon using O(epsilon^{-2} n^{2/3} polylog n) space. We first present an algorithm (with a simple analysis) that improves this to a factor 5+epsilon using the same space. We also improve upon the previous results for other graphs with bounded arboricity. We then present a factor 12.5 approximation for matching in planar graphs that can be implemented using O(log n) space in the adjacency list data stream model where the stream is a concatenation of the adjacency lists of the graph. The main idea behind our results is finding "local" fractional matchings, i.e., fractional matchings where the value of any edge e is solely determined by the edges sharing an endpoint with e. Our work also improves upon the results for the dynamic data stream model where the stream consists of a sequence of edges being inserted and deleted from the graph. We also extend our results to weighted graphs, improving over the bounds given by Bury and Schwiegelshohn (ESA 2015), via a reduction to the unweighted problem that increases the approximation by at most a factor of two
- …