15 research outputs found

    Safe Sharing for Sensitive Data

    Get PDF
    This workshop focused on the question of when and how human subjects\u27 data can be safely shared. It introduced the basics of data anonymization and discussed how to tell if a dataset has been de-identified. Case studies of successful anonymization and some spectacular failures were share

    The Mathematics of Risk: An introduction to guaranteed data de-identification

    Get PDF
    This webinar is devoted to the mathematical and theoretical underpinnings of guaranteed data anonymization. Topics covered include an overview of identifiers and quasi-identifiers, an introduction to k-anonymity, a look at some cases where k-anonymity breaks down, and anonymization hierarchies. The presenter will describe a method to assess a survey dataset for anonymization using standard statistical software and consider the question of anonymization overkill . Much of the academic material looking at data anonymization is quite abstract and aimed at computer scientists, while material aimed at data curators does not always consider recent developments. This webinar is intended to help bridge the gap

    Data Anonymization: K-anonymity Sensitivity Analysis

    Get PDF
    These days the digitization process is everywhere, spreading also across central governments and local authorities. It is hoped that, using open government data for scientific research purposes, the public good and social justice might be enhanced. Taking into account the European General Data Protection Regulation recently adopted, the big challenge in Portugal and other European countries, is how to provide the right balance between personal data privacy and data value for research. This work presents a sensitivity study of data anonymization procedure applied to a real open government data available from the Brazilian higher education evaluation system. The ARX k-anonymization algorithm, with and without generalization of some research value variables, was performed. The analysis of the amount of data / information lost and the risk of re-identification suggest that the anonymization process may lead to the under-representation of minorities and sociodemographic disadvantaged groups. It will enable scientists to improve the balance among risk, data usability, and contributions for the public good policies and practices.info:eu-repo/semantics/publishedVersio

    Implementation and evaluation of microaggregation algorithms for categorical data

    Get PDF
    Different data anonymization algorithms have been proposed in the literature, but sometimes it is not easy for the practitioners to understand which one is better for different situations.In a growingly digitalised world, the need for data privacy is apparent. Data scientists have contributed much previous work to ensure privacy regarding numerical data attributes in published datasets. However, work with categorical data tends to significantly affect the data utility concerning information loss, and less feasible research is available. The thesis aims to describe, implement and compare multiple microaggregation algorithms for categorical data. To achieve the goals of the thesis, and provide valuable output, multiple new proposals to handle categorical data based on the Mondrian algorithm were presented as part of the thesis. It was found that the proposals fared well compared to some previously presented algorithms, both in terms of algorithm execution time, potential information loss and reidentification risk

    Privacy Preserving Versus Utility Preserving in Data Anonymization: a study in higher education

    Get PDF
    No mundo digital, toda a atividade humana deixa um rasto de dados que constitui um recurso cada vez mais valioso, para avaliação e definição de estratégias nos mais variados domínios. A partilha desses dados, sendo socialmente importante, implica o respeito pela privacidade individual e portanto a sua anonimização. As atuais leis e regulamentos sobre privacidade oferecem orientações limitadas para lidar com um vasto leque de tipos de dados, ou com técnicas de reidentificação. Este trabalho pretende ilustrar um processo de anonimização, comparando para vários modelos de privacidade a perda de informação e a utilidade do conjunto de dados resultante. Encontrar o equilíbrio entre privacidade e utilidade é um desafio que pode ser mais facilmente alcançado por quem melhor conhece o significado dos dados e dos objetivos que se pretendem alcançar com eles.In the digital world, all human activity leaves a trace of data that is growingly valued for the evaluation and definition of strategies in varied domains. The sharing of those data, being socially relevant, implies the respect for individual privacy and so, its anonymization. The current laws and regulations about privacy offer limited guidance to deal with the vast range of datatypes or with techniques of re-identification. This work aims at illustrating a process of anonymization, comparing to several models of privacy, the loss of information and the usefulness of that dataset resulting from the anonymization. Finding a balance between privacy and utility is a challenge that can be more easily found by those who know better the meaning of the data and objectives aimed at.Este trabalho é parcialmente financiado pela FCT/MCTES através de fundos nacionais e quando aplicável cofinanciado por fundos comunitários no âmbito dos projetos UIDB/50008/2020 e CEMAPRE/REM - UIDB/05069/2020.info:eu-repo/semantics/publishedVersio

    Anonimização de Dados em Educação

    Get PDF
    Interest in data privacy is not only growing, but the quantity of data collected is also increasing. This data, which is collected and stored electronically, contains information related with all aspects of our lives, frequently containing sensitive information, such as financial records, activity in social networks, location traces collected by our mobile phones and even medical records. Consequently, it becomes paramount to assure the best protection for this data, so that no harm is done to individuals even if the data is to become publicly available. To achieve it, it is necessary to avoid the linkage between records in a dataset and a real world individual. Despite some attributes, such as gender and age, though alone they can not identify a corresponding individual, their combination with other datasets can lead to the existence of unique records in the dataset and a consequent linkage to a real world individual. Therefore, with data anonymization, it is possible to assure, with various degrees of protection, that said linkage is avoided the best we can. However, this process can have a decline in data utility as consequence. In this work, we explore the terminology and some of the techniques that can be used during the process of data anonymization. Moreover, we show the effects of said techniques on information loss, data utility and re-identification risk, when applied to a dataset with personal information collected from college graduated students. Finally, and once the results are presented, we perform an analysis and comparative discussion of the obtained results.Hoje em dia é possível observar que tanto a preocupação com a privacidade dos dados pessoais como a quantidade de dados recolhidos estão a aumentar. Estes dados, recolhidos e armazenados eletronicamente, contêm informação relacionada com todos os aspetos das nossas vidas, informação essa muitas vezes sensível, tal como registos financeiros, atividade em redes sociais, rastreamento de dispositivos móveis e até registos médicos. Consequentemente, torna-se vital assegurar a proteção destes dados para que, mesmo se tornados públicos, não causem danos pessoais aos indivíduos envolvidos. Para isso, é necessário evitar que registos nos dados sejam associados a indivíduos reais. Apesar de atributos, como o género e a idade, singularmente não conseguirem identificar o individuo correspondente, a sua combinação com outros conjuntos de dados, pode levar à existência de um registo único no conjunto de dados e consequente associação a um individuo. Com a anonimização dos dados, é possível assegurar, com variados graus de proteção, que essa associação a um individuo real seja evitada ao máximo. Contudo, este processo pode ter como consequência uma diminuição na utilidade dos dados. Com este trabalho, exploramos a terminologia e algumas das técnicas que podem ser utilizadas no processo de anonimização de dados. Mostramos também os efeitos dessas várias técnicas tanto na perda de informação e utilidade dos dados, como no risco de re-identificação associado, quando aplicadas a um conjunto de dados com informação pessoal recolhida a alunos que conluíram o ensino superior. No final, e uma vez feita a apresentação dos resultados, é feita uma análise e discussão comparativa dos resultados obtidos

    Risks of Privacy-Enhancing Technologies: Complexity and Implications of Differential Privacy in the Context of Cybercrime

    Get PDF
    In recent years, the swift expansion of technology-enabled data harvesting has infiltrated modern life and led to the collection of massive amounts of private data. As a result, the preservation of individual privacy has become a salient concern for the general public. Combined with an increase in the frequency and prevalence of cybercrime, more of the public now face the very real risk of privacy loss associated with illegitimate use of private data. Differential Privacy has emerged as a relatively new privacy-preserving method with the potential to significantly reduce the likelihood of harmful data disclosures stemming from malicious use. However, research has not explicitly investigated Differential Privacy from the perspective of criminal justice or examined the utility of Differential Privacy as a possible situational crime prevention measure to cybercrime. Therefore, this chapter explores the proliferation of cybercrime through advances in technology and briefly examines other privacy-preserving methods before discussing the possible use of Differential Privacy as a viable countermeasure to cybercrime. The chapter concludes with a discussion of several practical considerations related to the use of Differential Privacy as a tool in the fight against cybercrime and offers recommendations for future research

    On Anonymizing the Provenance of Collection-Based Workflows

    Get PDF
    We examine in this paper the problem of anonymizing the prove-nance of collection-oriented workflows, in which the constituent modules use and generate sets of data records. Despite their popularity , this kind of workflow has been overlooked in the literature w.r.t privacy. We, therefore, set out in this paper to examine the following questions: How the provenance of a collection-based module can be anonymized? Can lineage information be preserved? Beyond a single module, how can the provenance of a whole work-flow be anonymized? As well as addressing the above questions, we report on evaluation exercises that assess the effectiveness and efficiency of our solution. In particular, we tease apart the parameters that impact the quality of the obtained anonymized provenance information
    corecore