86 research outputs found

    Verification in Privacy Preserving Data Publishing

    Get PDF
    Privacy preserving data publication is a major concern for both the owners of data and the data publishers. Principles like k-anonymity, l-diversity were proposed to reduce privacy violations. On the other side, no studies were found on verification on the anonymized data in terms of adversarial breach and anonymity levels. However, the anonymized data is still prone to attacks due to the presence of dependencies among quasi-identifiers and sensitive attributes. This paper presents a novel framework to detect the existence of those dependencies and a solution to reduce them. The advantages of our approach are i) privacy violations can be detected, ii) the extent of privacy risk can be measured and iii) re-anonymization can be done on vulnerable blocks of data. The work is further extended to show how the adversarial breach knowledge eventually increased when new tuples are added and an on the fly solution to reduce it is discussed. Experimental results are reported and analyzed

    Datalog Unchained

    Get PDF
    International audienceThis is the companion paper of a talk in the Gems of PODS series, that reviews the development, starting at PODS 1988, of a family of Datalog-like languages with procedural, forward chaining semantics, providing an alternative to the classical declarative, model-theoretic semantics. These languages also provide a unified formalism that can express important classes of queries including fixpoint, while, and all computable queries. They can also incorporate in a natural fashion updates and nondeterminism. Datalog variants with forward chaining semantics have been adopted in a variety of settings, including active databases, production systems, distributed data exchange, and data-driven reactive systems

    Data Sharing Fundamentals: Definition and Characteristics

    Get PDF
    The importance of data as a key resource is a universal theme dominating social and business life. In this regard, inter-organizational data sharing shines in a new light prompting businesses to leverage their potential. However, it is still unclear what data sharing actually entails, i.e., what it means, what its potentials are, and what barriers one must overcome. In short, it lacks conceptual clarity and a clear description of its characteristics. The conceptual ambiguity and the synonymous use with data exchange in the literature are particularly problematic, which prevents a targeted conceptualization and use. The paper starts precisely at this point as it proposes a unifying definition and characteristics of data sharing. We report on a systematic literature review characterizing data sharing and delineating it from data exchange

    Small Circuits Imply Efficient Arthur-Merlin Protocols

    Get PDF
    The inner product function ? x,y ? = ?_i x_i y_i mod 2 can be easily computed by a (linear-size) AC?(?) circuit: that is, a constant depth circuit with AND, OR and parity (XOR) gates. But what if we impose the restriction that the parity gates can only be on the bottom most layer (closest to the input)? Namely, can the inner product function be computed by an AC? circuit composed with a single layer of parity gates? This seemingly simple question is an important open question at the frontier of circuit lower bound research. In this work, we focus on a minimalistic version of the above question. Namely, whether the inner product function cannot be approximated by a small DNF augmented with a single layer of parity gates. Our main result shows that the existence of such a circuit would have unexpected implications for interactive proofs, or more specifically, for interactive variants of the Data Streaming and Communication Complexity models. In particular, we show that the existence of such a small (i.e., polynomial-size) circuit yields: 1) An O(d)-message protocol in the Arthur-Merlin Data Streaming model for every n-variate, degree d polynomial (over GF(2)), using only O?(d) ?log(n) communication and space complexity. In particular, this gives an AM[2] Data Streaming protocol for a variant of the well-studied triangle counting problem, with poly-logarithmic communication and space complexities. 2) A 2-message communication complexity protocol for any sparse (or low degree) polynomial, and for any function computable by an AC?(?) circuit. Specifically, for the latter, we obtain a protocol with communication complexity that is poly-logarithmic in the size of the AC?(?) circuit

    Triangle Counting in Dynamic Graph Streams

    Get PDF
    Estimating the number of triangles in graph streams using a limited amount of memory has become a popular topic in the last decade. Different variations of the problem have been studied, depending on whether the graph edges are provided in an arbitrary order or as incidence lists. However, with a few exceptions, the algorithms have considered {\em insert-only} streams. We present a new algorithm estimating the number of triangles in {\em dynamic} graph streams where edges can be both inserted and deleted. We show that our algorithm achieves better time and space complexity than previous solutions for various graph classes, for example sparse graphs with a relatively small number of triangles. Also, for graphs with constant transitivity coefficient, a common situation in real graphs, this is the first algorithm achieving constant processing time per edge. The result is achieved by a novel approach combining sampling of vertex triples and sparsification of the input graph. In the course of the analysis of the algorithm we present a lower bound on the number of pairwise independent 2-paths in general graphs which might be of independent interest. At the end of the paper we discuss lower bounds on the space complexity of triangle counting algorithms that make no assumptions on the structure of the graph.Comment: New version of a SWAT 2014 paper with improved result

    Improved Algorithms for White-Box Adversarial Streams

    Full text link
    We study streaming algorithms in the white-box adversarial stream model, where the internal state of the streaming algorithm is revealed to an adversary who adaptively generates the stream updates, but the algorithm obtains fresh randomness unknown to the adversary at each time step. We incorporate cryptographic assumptions to construct robust algorithms against such adversaries. We propose efficient algorithms for sparse recovery of vectors, low rank recovery of matrices and tensors, as well as low rank plus sparse recovery of matrices, i.e., robust PCA. Unlike deterministic algorithms, our algorithms can report when the input is not sparse or low rank even in the presence of such an adversary. We use these recovery algorithms to improve upon and solve new problems in numerical linear algebra and combinatorial optimization on white-box adversarial streams. For example, we give the first efficient algorithm for outputting a matching in a graph with insertions and deletions to its edges provided the matching size is small, and otherwise we declare the matching size is large. We also improve the approximation versus memory tradeoff of previous work for estimating the number of non-zero elements in a vector and computing the matrix rank.Comment: ICML 202

    Planar Matching in Streams Revisited

    Get PDF
    We present data stream algorithms for estimating the size or weight of the maximum matching in low arboricity graphs. A large body of work has focused on improving the constant approximation factor for general graphs when the data stream algorithm is permitted O(n polylog n) space where n is the number of nodes. This space is necessary if the algorithm must return the matching. Recently, Esfandiari et al. (SODA 2015) showed that it was possible to estimate the maximum cardinality of a matching in a planar graph up to a factor of 24+epsilon using O(epsilon^{-2} n^{2/3} polylog n) space. We first present an algorithm (with a simple analysis) that improves this to a factor 5+epsilon using the same space. We also improve upon the previous results for other graphs with bounded arboricity. We then present a factor 12.5 approximation for matching in planar graphs that can be implemented using O(log n) space in the adjacency list data stream model where the stream is a concatenation of the adjacency lists of the graph. The main idea behind our results is finding "local" fractional matchings, i.e., fractional matchings where the value of any edge e is solely determined by the edges sharing an endpoint with e. Our work also improves upon the results for the dynamic data stream model where the stream consists of a sequence of edges being inserted and deleted from the graph. We also extend our results to weighted graphs, improving over the bounds given by Bury and Schwiegelshohn (ESA 2015), via a reduction to the unweighted problem that increases the approximation by at most a factor of two
    corecore