4,634 research outputs found

    Transparent Privacy is Principled Privacy

    Full text link
    Differential privacy revolutionizes the way we think about statistical disclosure limitation. Among the benefits it brings to the table, one is particularly profound and impactful. Under this formal approach to privacy, the mechanism with which data is privatized can be spelled out in full transparency, without sacrificing the privacy guarantee. Curators of open-source demographic and scientific data are at a position to offer privacy without obscurity. This paper supplies a technical treatment to the pitfalls of obscure privacy, and establishes transparent privacy as a prerequisite to drawing correct statistical inference. It advocates conceiving transparent privacy as a dynamic component that can improve data quality from the total survey error perspective, and discusses the limited statistical usability of mere procedural transparency which may arise when dealing with mandated invariants. Transparent privacy is the only viable path towards principled inference from privatized data releases. Its arrival marks great progress towards improved reproducibility, accountability and public trust.Comment: 2 figure

    Supervised learning using a symmetric bilinear form for record linkage

    Get PDF
    Record Linkage is used to link records of two different files corresponding to the same individuals. These algorithms are used for database integration. In data privacy, these algorithms are used to evaluate the disclosure risk of a protected data set by linking records that belong to the same individual. The degree of success when linking the original (unprotected data) with the protected data gives an estimation of the disclosure risk. In this paper we propose a new parameterized aggregation operator and a supervised learning method for disclosure risk assessment. The parameterized operator is a symmetric bilinear form and the supervised learning method is formalized as an optimization problem. The target of the optimization problem is to find the values of the aggregation parameters that maximize the number of re-identification (or correct links). We evaluate and compare our proposal with other non-parametrized variations of record linkage, such as those using the Mahalanobis distance and the Euclidean distance (one of the most used approaches for this purpose). Additionally, we also compare it with other previously presented parameterized aggregation operators for record linkage such as the weighted mean and the Choquet integral. From these comparisons we show how the proposed aggregation operator is able to overcome or at least achieve similar results than the other parameterized operators. We also study which are the necessary optimization problem conditions to consider the described aggregation functions as metric functions

    Population Synthesis via k-Nearest Neighbor Crossover Kernel

    Full text link
    The recent development of multi-agent simulations brings about a need for population synthesis. It is a task of reconstructing the entire population from a sampling survey of limited size (1% or so), supplying the initial conditions from which simulations begin. This paper presents a new kernel density estimator for this task. Our method is an analogue of the classical Breiman-Meisel-Purcell estimator, but employs novel techniques that harness the huge degree of freedom which is required to model high-dimensional nonlinearly correlated datasets: the crossover kernel, the k-nearest neighbor restriction of the kernel construction set and the bagging of kernels. The performance as a statistical estimator is examined through real and synthetic datasets. We provide an "optimization-free" parameter selection rule for our method, a theory of how our method works and a computational cost analysis. To demonstrate the usefulness as a population synthesizer, our method is applied to a household synthesis task for an urban micro-simulator.Comment: 10 pages, 4 figures, IEEE International Conference on Data Mining (ICDM) 201

    Gaussian Differential Privacy on Riemannian Manifolds

    Full text link
    We develop an advanced approach for extending Gaussian Differential Privacy (GDP) to general Riemannian manifolds. The concept of GDP stands out as a prominent privacy definition that strongly warrants extension to manifold settings, due to its central limit properties. By harnessing the power of the renowned Bishop-Gromov theorem in geometric analysis, we propose a Riemannian Gaussian distribution that integrates the Riemannian distance, allowing us to achieve GDP in Riemannian manifolds with bounded Ricci curvature. To the best of our knowledge, this work marks the first instance of extending the GDP framework to accommodate general Riemannian manifolds, encompassing curved spaces, and circumventing the reliance on tangent space summaries. We provide a simple algorithm to evaluate the privacy budget μ\mu on any one-dimensional manifold and introduce a versatile Markov Chain Monte Carlo (MCMC)-based algorithm to calculate μ\mu on any Riemannian manifold with constant curvature. Through simulations on one of the most prevalent manifolds in statistics, the unit sphere SdS^d, we demonstrate the superior utility of our Riemannian Gaussian mechanism in comparison to the previously proposed Riemannian Laplace mechanism for implementing GDP

    An Architectural Solution of Assistance e-Services for Diabetes Diet

    Get PDF
    The aim of this paper is to outline the requirements and main architecture for a useful tool for determining the nutrition facts of food for people having Type 2 Diabetes. This diabetes is used only to establish the target audience, a “mass of people†having, maybe, to less in common regarding the computer usage skills. The characteristics of the target audience (huge number, diversity of habits and behaviors, computer usage skills) requires a solution based on web services delivered at least partly as a standalone/ portable application, build from Web services and provided with means for domain knowledge dissemination and usage.Software Architecture, Knowledge Management, SIK, Business Rules, Type 2 Diabetes

    Autonomic log/restore for advanced optimistic simulation systems

    Get PDF
    In this paper we address state recoverability in optimistic simulation systems by presenting an autonomic log/restore architecture. Our proposal is unique in that it jointly provides the following features: (i) log/restore operations are carried out in a completely transparent manner to the application programmer, (ii) the simulation-object state can be scattered across dynamically allocated non-contiguous memory chunks, (iii) two differentiated operating modes, incremental vs non-incremental, coexist via transparent, optimized run-time management of dual versions of the same application layer, with dynamic selection of the best suited operating mode in different phases of the optimistic simulation run, and (iv) determinationof the best suited mode for any time frame is carried out on the basis of an innovative modeling/optimization approach that takes into account stability of each operating mode vs variations of the model execution parameters. © 2010 IEEE
    corecore