7 research outputs found
An Evolutionary Optimization Approach for Categorical Data Protection
The continuous growing amount of public sensible data has increased the risk of breaking the privacy of people or institutions in those datasets. Many protection methods have been developed to solve this problem by either distorting or generalizing data but taking into account the difficult tradeoff between data utility (information loss) and protection against disclosure (disclosure risk). In this paper we present an optimization approach for data protection based on an evolutionary algorithm which is guided by a combination of information loss and disclosure risk measures. In this way, state-of-the-art protection methods are combined to obtain new data protections with a better trade-off between these two measures. The paper presents several experimental results that assess the performance of our approach
Revisiting distance-based record linkage for privacy-preserving release of statistical datasets
Statistical Disclosure Control (SDC, for short) studies the problem of privacy-preserving data publishing in cases where the data is expected to be used for statistical analysis. An original dataset T containing sensitive information is transformed into a sanitized version T' which is released to the public. Both utility and privacy aspects are very important in this setting. For utility, T' must allow data miners or statisticians to obtain similar results to those which would have been obtained from the original dataset T. For privacy, T' must significantly reduce the ability of an adversary to infer sensitive information on the data subjects in T. One of the main a-posteriori measures that the SDC community has considered up to now when analyzing the privacy offered by a given protection method is the Distance-Based Record Linkage (DBRL) risk measure. In this work, we argue that the classical DBRL risk measure is insufficient. For this reason, we introduce the novel Global Distance-Based Record Linkage (GDBRL) risk measure. We claim that this new measure must be evaluated alongside the classical DBRL measure in order to better assess the risk in publishing T' instead of T. After that, we describe how this new measure can be computed by the data owner and discuss the scalability of those computations. We conclude by extensive experimentation where we compare the risk assessments offered by our novel measure as well as by the classical one, using well-known SDC protection methods. Those experiments validate our hypothesis that the GDBRL risk measure issues, in many cases, higher risk assessments than the classical DBRL measure. In other words, relying solely on the classical DBRL measure for risk assessment might be misleading, as the true risk may be in fact higher. Hence, we strongly recommend that the SDC community considers the new GDBRL risk measure as an additional measure when analyzing the privacy offered by SDC protection algorithms.Postprint (author's final draft
Rethinking rank swapping to decrease disclosure risk
Nowadays, the need for privacy motivates the use of methods that allow to protect a microdata file both minimizing the disclosure risk and preserving the data utility. A very popular microdata protection method is rank swapping. Record linkage is the standard mechanism used to measure the disclosure risk of a microdata protection method. In this paper we present a new record linkage method, specific for rank swapping, which obtains more links than standard ones. The consequence is that rank swapping has a higher disclosure risk than believed up to now. Motivated by this, we present two new variants of the rank swapping method, which make the new record linkage technique unsuitable. Therefore, the real disclosure risk of these new methods is lower than the standard rank swapping. © 2007 Elsevier B.V. All rights reserved.Partial support by the Spanish MEC (projects ARES – CONSOLIDER INGENIO 2010 CSD2007-00004 – and eAEGIS – TSI2007-65406-C03-02) is acknowledged. Jordi Nin wants to thank the Spanish Council for Scientific Research (CSIC) for his I3P grant. Appendix APeer Reviewe
Implementing privacy-preserving filters in the MOA stream mining framework
[CATALÀ] S'han implementat mètodes d'SDC en quatre filtres de privacitat pel software MOA. Els algorismes han estat adaptats de solucions conegudes per habilitar el seu ús en entorns de processament de fluxos. Finalment, han estat avaluats en termes del risc de revelació i la pèrdua d'informació.[ANGLÈS] Four MOA privacy-preserving filters have been developed to implement some SDC methods. The algorithms have been adapted from well-known solutions to enable their use in streaming settings. Finally, they have been benchmarked to assess their quality in terms of disclosure risk and information loss