7 research outputs found

    Differential Privacy in Metric Spaces: Numerical, Categorical and Functional Data Under the One Roof

    Get PDF
    We study Differential Privacy in the abstract setting of Probability on metric spaces. Numerical, categorical and functional data can be handled in a uniform manner in this setting. We demonstrate how mechanisms based on data sanitisation and those that rely on adding noise to query responses fit within this framework. We prove that once the sanitisation is differentially private, then so is the query response for any query. We show how to construct sanitisations for high-dimensional databases using simple 1-dimensional mechanisms. We also provide lower bounds on the expected error for differentially private sanitisations in the general metric space setting. Finally, we consider the question of sufficient sets for differential privacy and show that for relaxed differential privacy, any algebra generating the Borel σ\sigma-algebra is a sufficient set for relaxed differential privacy.Comment: 18 Page

    Characterizing the Sample Complexity of Private Learners

    Full text link
    In 2008, Kasiviswanathan et al. defined private learning as a combination of PAC learning and differential privacy. Informally, a private learner is applied to a collection of labeled individual information and outputs a hypothesis while preserving the privacy of each individual. Kasiviswanathan et al. gave a generic construction of private learners for (finite) concept classes, with sample complexity logarithmic in the size of the concept class. This sample complexity is higher than what is needed for non-private learners, hence leaving open the possibility that the sample complexity of private learning may be sometimes significantly higher than that of non-private learning. We give a combinatorial characterization of the sample size sufficient and necessary to privately learn a class of concepts. This characterization is analogous to the well known characterization of the sample complexity of non-private learning in terms of the VC dimension of the concept class. We introduce the notion of probabilistic representation of a concept class, and our new complexity measure RepDim corresponds to the size of the smallest probabilistic representation of the concept class. We show that any private learning algorithm for a concept class C with sample complexity m implies RepDim(C)=O(m), and that there exists a private learning algorithm with sample complexity m=O(RepDim(C)). We further demonstrate that a similar characterization holds for the database size needed for privately computing a large class of optimization problems and also for the well studied problem of private data release

    Enabling Efficient Fuzzy Keyword Search over Encrypted Data in Cloud Computing

    Get PDF
    As Cloud Computing becomes prevalent, more and more sensitive information are being centralized into the cloud. For the protection of data privacy, sensitive data usually have to be encrypted before outsourcing, which makes effective data utilization a very challenging task. Although traditional searchable encryption schemes allow a user to securely search over encrypted data through keywords and selectively retrieve files of interest, these techniques support only \emph{exact} keyword search. That is, there is no tolerance of minor typos and format inconsistencies which, on the other hand, are typical user searching behavior and happen very frequently. This significant drawback makes existing techniques unsuitable in Cloud Computing as it greatly affects system usability, rendering user searching experiences very frustrating and system efficacy very low. In this paper, for the first time we formalize and solve the problem of effective fuzzy keyword search over encrypted cloud data while maintaining keyword privacy. Fuzzy keyword search greatly enhances system usability by returning the matching files when users\u27 searching inputs exactly match the predefined keywords or the closest possible matching files based on keyword similarity semantics, when exact match fails. In our solution, we exploit edit distance to quantify keywords similarity and develop two advanced techniques on constructing fuzzy keyword sets, which achieve optimized storage and representation overheads. We further propose a brand new symbol-based trie-traverse searching scheme, where a multi-way tree structure is built up using symbols transformed from the resulted fuzzy keyword sets. Through rigorous security analysis, we show that our proposed solution is secure and privacy-preserving, while correctly realizing the goal of fuzzy keyword search. Extensive experimental results demonstrate the efficiency of the proposed solution

    Topics in Massive Data Summarization.

    Full text link
    We consider three problems in this thesis. First, we want to construct a nearly workload-optimal histogram. Given B, we want to find the near optimal B bucket histogram under associated workload w within 1 + epsilon error tolerance. In the cash register model where data is streamed as a series of updates, we can build a histogram using polylogarithmic space, polylogarithmic time to process each item, and polylogarithmic post-processing time to build the histogram. All these results need the workload to be explicitly stored since we show that if the workload is summarized in small space lossily, algorithmic results such as above do not exist. Then, we consider the problem of private computation of approximate Heavy Hitters. Alice and Bob each hold a vector and, in the vector sum, they want to find the B largest values along with their indices. We show how to solve the problem privately with polylogarithmic communication, polynomial work and constantly many rounds in the sense that nothing is learned by Alice and Bob beyond what is implied by their input, the ideal top-B output, and goodness of approximation (equivalently,the Euclidean norm of the vector sum). We give lower bounds showing that the Euclidean norm must leak by any efficient algorithm. In the third problem, we want to build a near optimal histogram on probabilistic data streams. Given B, we want to find the near optimal B bucket histogram on probabilistic data streams under both L1 measurement and L2 measurement. We give deterministic algorithms without sampling. We can build histograms using poly-logarithmic space, polylogarithmic time to process each item, and polylogarithmic post-processing time to build the histogram. The result we give under L2 measurement is within 1 + epsilon error tolerance, and the result under L1 measurement is heuristic. We also give a direction to give guarantees to the heuristic.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/60841/1/xuanzh_1.pd

    Private Approximation of Search Problems

    No full text
    Many approximation algorithms have been presented in the last decades for hard search problems. The focus of this paper is on cryptographic applications, where it is desired to design algorithms which do not leak unnecessary information. Specifically, we are interested in private approximation algorithms – efficient algorithms whose output does not leak information not implied by the optimal solutions to the search problems. Privacy requirements add constraints on the approximation algorithms; in particular, known approximation algorithms usually leak a lot of information. For functions, [Feigenbaum et al., ICALP 2001] presented a natural requirement that a private algorithm should not leak information not implied by the original function. Generalizing this requirement to search problems is not straight forward as an input may have many different outputs. We present a new definition that captures a minimal privacy requirement from such algorithms – applied to an input instance, it should not leak any information that is not implied by its collection of exact solutions. Although our privacy requirement seems minimal, we show that for well studied problems, as vertex cover and maximum exact 3SAT, private approximation algorithms are unlikely to exist even for poor approximation ratios. Similar to [Halevi et al., STOC 2001], we define a relaxed notion of approximation algorithms that leak (little) information, and demonstrate the applicability of this notion by showing near optimal approximation algorithms for maximum exact 3SAT which leak little information

    Private approximation of clustering and vertex cover

    No full text
    Private approximation of search problems deals with finding approximate solutions to search problems while disclosing as little information as possible. The focus of this work is on private approximation of the vertex cover problem and two well studied clustering problems – k-center and k-median. Vertex cover was considered in [Beimel, Carmi, Nissim, and Weinreb, STOC, 2006] and we improve their infeasibility results. Clustering algorithms are frequently applied to sensitive data, and hence are of interest in the contexts of secure computation and private approximation. We show that these problems do not admit private approximations, or even approximation algorithms that leak significant number of bits. For the vertex cover problem we show a tight infeasibility result: every algorithm that ρ(n)-approximates vertex-cover must leak Ω(n/ρ(n)) bits (where n is the number of vertices in the graph). For the clustering problems we prove that even approximation algorithms with a poor approximation ratio must leak Ω(n) bits (where n is the number of points in the instance). For these results we develop new proof techniques, which are more simple and intuitive than those in Beimel et al., and yet allow stronger infeasibility results. Our proofs rely on the hardness of the promise problem where a unique optimal solution exists [Valiant and Vazirani, Theoretical Computer Science, 1986], on the hardness of approximating witnesses for NP-hard problems ([Kumar and Sivakumar
    corecore