8 research outputs found

    Privacy via the Johnson-Lindenstrauss Transform

    Full text link
    Suppose that party A collects private information about its users, where each user's data is represented as a bit vector. Suppose that party B has a proprietary data mining algorithm that requires estimating the distance between users, such as clustering or nearest neighbors. We ask if it is possible for party A to publish some information about each user so that B can estimate the distance between users without being able to infer any private bit of a user. Our method involves projecting each user's representation into a random, lower-dimensional space via a sparse Johnson-Lindenstrauss transform and then adding Gaussian noise to each entry of the lower-dimensional representation. We show that the method preserves differential privacy---where the more privacy is desired, the larger the variance of the Gaussian noise. Further, we show how to approximate the true distances between users via only the lower-dimensional, perturbed data. Finally, we consider other perturbation methods such as randomized response and draw comparisons to sketch-based methods. While the goal of releasing user-specific data to third parties is more broad than preserving distances, this work shows that distance computations with privacy is an achievable goal.Comment: 24 page

    Contributions to privacy in web search engines

    Get PDF
    Els motors de cerca d鈥橧nternet recullen i emmagatzemen informaci贸 sobre els seus usuaris per tal d鈥檕ferir-los millors serveis. A canvi de rebre un servei personalitzat, els usuaris perden el control de les seves pr貌pies dades. Els registres de cerca poden revelar informaci贸 sensible de l鈥檜suari, o fins i tot revelar la seva identitat. En aquesta tesis tractem com limitar aquests problemes de privadesa mentre mantenim suficient informaci贸 a les dades. La primera part d鈥檃questa tesis tracta els m猫todes per prevenir la recollida d鈥檌nformaci贸 per part dels motores de cerca. Ja que aquesta informaci贸 es requerida per oferir un servei prec铆s, l鈥檕bjectiu es proporcionar registres de cerca que siguin adequats per proporcionar personalitzaci贸. Amb aquesta finalitat, proposem un protocol que empra una xarxa social per tal d鈥檕fuscar els perfils dels usuaris. La segona part tracta la disseminaci贸 de registres de cerca. Proposem t猫cniques que la permeten, proporcionant k-anonimat i minimitzant la p猫rdua d鈥檌nformaci贸.Web Search Engines collects and stores information about their users in order to tailor their services better to their users' needs. Nevertheless, while receiving a personalized attention, the users lose the control over their own data. Search logs can disclose sensitive information and the identities of the users, creating risks of privacy breaches. In this thesis we discuss the problem of limiting the disclosure risks while minimizing the information loss. The first part of this thesis focuses on the methods to prevent the gathering of information by WSEs. Since search logs are needed in order to receive an accurate service, the aim is to provide logs that are still suitable to provide personalization. We propose a protocol which uses a social network to obfuscate users' profiles. The second part deals with the dissemination of search logs. We propose microaggregation techniques which allow the publication of search logs, providing kk-anonymity while minimizing the information loss

    Tight on Budget? Tight Bounds for r-Fold Approximate Differential Privacy

    Get PDF
    Many applications, such as anonymous communication systems, privacy-enhancing database queries, or privacy-enhancing machine-learning methods, require robust guarantees under thousands and sometimes millions of observations. The notion of r-fold approximate differential privacy (ADP) offers a well-established framework with a precise characterization of the degree of privacy after r observations of an attacker. However, existing bounds for r-fold ADP are loose and, if used for estimating the required degree of noise for an application, can lead to over-cautious choices for perturbation randomness and thus to suboptimal utility or overly high costs. We present a numerical and widely applicable method for capturing the privacy loss of differentially private mechanisms under composition, which we call privacy buckets. With privacy buckets we compute provable upper and lower bounds for ADP for a given number of observations. We compare our bounds with state-of-the-art bounds for r-fold ADP, including Kairouz, Oh, and Viswanath\u27s composition theorem (KOV), concentrated differential privacy and the moments accountant. While KOV proved optimal bounds for heterogeneous adaptive k-fold composition, we show that for concrete sequences of mechanisms tighter bounds can be derived by taking the mechanisms\u27 structure into account. We compare previous bounds for the Laplace mechanism, the Gauss mechanism, for a timing leakage reduction mechanism, and for the stochastic gradient descent and we significantly improve over their results (except that we match the KOV bound for the Laplace mechanism, for which it seems tight). Our lower bounds almost meet our upper bounds, showing that no significantly tighter bounds are possible
    corecore