7 research outputs found

    An Evolutionary Optimization Approach for Categorical Data Protection

    Get PDF
    The continuous growing amount of public sensible data has increased the risk of breaking the privacy of people or institutions in those datasets. Many protection methods have been developed to solve this problem by either distorting or generalizing data but taking into account the difficult tradeoff between data utility (information loss) and protection against disclosure (disclosure risk). In this paper we present an optimization approach for data protection based on an evolutionary algorithm which is guided by a combination of information loss and disclosure risk measures. In this way, state-of-the-art protection methods are combined to obtain new data protections with a better trade-off between these two measures. The paper presents several experimental results that assess the performance of our approach

    Attribute selection in multivariate microaggregation

    Full text link

    Revisiting distance-based record linkage for privacy-preserving release of statistical datasets

    Get PDF
    Statistical Disclosure Control (SDC, for short) studies the problem of privacy-preserving data publishing in cases where the data is expected to be used for statistical analysis. An original dataset T containing sensitive information is transformed into a sanitized version T' which is released to the public. Both utility and privacy aspects are very important in this setting. For utility, T' must allow data miners or statisticians to obtain similar results to those which would have been obtained from the original dataset T. For privacy, T' must significantly reduce the ability of an adversary to infer sensitive information on the data subjects in T. One of the main a-posteriori measures that the SDC community has considered up to now when analyzing the privacy offered by a given protection method is the Distance-Based Record Linkage (DBRL) risk measure. In this work, we argue that the classical DBRL risk measure is insufficient. For this reason, we introduce the novel Global Distance-Based Record Linkage (GDBRL) risk measure. We claim that this new measure must be evaluated alongside the classical DBRL measure in order to better assess the risk in publishing T' instead of T. After that, we describe how this new measure can be computed by the data owner and discuss the scalability of those computations. We conclude by extensive experimentation where we compare the risk assessments offered by our novel measure as well as by the classical one, using well-known SDC protection methods. Those experiments validate our hypothesis that the GDBRL risk measure issues, in many cases, higher risk assessments than the classical DBRL measure. In other words, relying solely on the classical DBRL measure for risk assessment might be misleading, as the true risk may be in fact higher. Hence, we strongly recommend that the SDC community considers the new GDBRL risk measure as an additional measure when analyzing the privacy offered by SDC protection algorithms.Postprint (author's final draft

    Synthetic data methods for disclosure limitation

    Get PDF

    Rethinking rank swapping to decrease disclosure risk

    No full text
    Nowadays, the need for privacy motivates the use of methods that allow to protect a microdata file both minimizing the disclosure risk and preserving the data utility. A very popular microdata protection method is rank swapping. Record linkage is the standard mechanism used to measure the disclosure risk of a microdata protection method. In this paper we present a new record linkage method, specific for rank swapping, which obtains more links than standard ones. The consequence is that rank swapping has a higher disclosure risk than believed up to now. Motivated by this, we present two new variants of the rank swapping method, which make the new record linkage technique unsuitable. Therefore, the real disclosure risk of these new methods is lower than the standard rank swapping. © 2007 Elsevier B.V. All rights reserved.Partial support by the Spanish MEC (projects ARES – CONSOLIDER INGENIO 2010 CSD2007-00004 – and eAEGIS – TSI2007-65406-C03-02) is acknowledged. Jordi Nin wants to thank the Spanish Council for Scientific Research (CSIC) for his I3P grant. Appendix APeer Reviewe

    Implementing privacy-preserving filters in the MOA stream mining framework

    Get PDF
    [CATALÀ] S'han implementat mètodes d'SDC en quatre filtres de privacitat pel software MOA. Els algorismes han estat adaptats de solucions conegudes per habilitar el seu ús en entorns de processament de fluxos. Finalment, han estat avaluats en termes del risc de revelació i la pèrdua d'informació.[ANGLÈS] Four MOA privacy-preserving filters have been developed to implement some SDC methods. The algorithms have been adapted from well-known solutions to enable their use in streaming settings. Finally, they have been benchmarked to assess their quality in terms of disclosure risk and information loss
    corecore