399 research outputs found

    Anonimização de Dados em Educação

    Get PDF
    Interest in data privacy is not only growing, but the quantity of data collected is also increasing. This data, which is collected and stored electronically, contains information related with all aspects of our lives, frequently containing sensitive information, such as financial records, activity in social networks, location traces collected by our mobile phones and even medical records. Consequently, it becomes paramount to assure the best protection for this data, so that no harm is done to individuals even if the data is to become publicly available. To achieve it, it is necessary to avoid the linkage between records in a dataset and a real world individual. Despite some attributes, such as gender and age, though alone they can not identify a corresponding individual, their combination with other datasets can lead to the existence of unique records in the dataset and a consequent linkage to a real world individual. Therefore, with data anonymization, it is possible to assure, with various degrees of protection, that said linkage is avoided the best we can. However, this process can have a decline in data utility as consequence. In this work, we explore the terminology and some of the techniques that can be used during the process of data anonymization. Moreover, we show the effects of said techniques on information loss, data utility and re-identification risk, when applied to a dataset with personal information collected from college graduated students. Finally, and once the results are presented, we perform an analysis and comparative discussion of the obtained results.Hoje em dia é possível observar que tanto a preocupação com a privacidade dos dados pessoais como a quantidade de dados recolhidos estão a aumentar. Estes dados, recolhidos e armazenados eletronicamente, contêm informação relacionada com todos os aspetos das nossas vidas, informação essa muitas vezes sensível, tal como registos financeiros, atividade em redes sociais, rastreamento de dispositivos móveis e até registos médicos. Consequentemente, torna-se vital assegurar a proteção destes dados para que, mesmo se tornados públicos, não causem danos pessoais aos indivíduos envolvidos. Para isso, é necessário evitar que registos nos dados sejam associados a indivíduos reais. Apesar de atributos, como o género e a idade, singularmente não conseguirem identificar o individuo correspondente, a sua combinação com outros conjuntos de dados, pode levar à existência de um registo único no conjunto de dados e consequente associação a um individuo. Com a anonimização dos dados, é possível assegurar, com variados graus de proteção, que essa associação a um individuo real seja evitada ao máximo. Contudo, este processo pode ter como consequência uma diminuição na utilidade dos dados. Com este trabalho, exploramos a terminologia e algumas das técnicas que podem ser utilizadas no processo de anonimização de dados. Mostramos também os efeitos dessas várias técnicas tanto na perda de informação e utilidade dos dados, como no risco de re-identificação associado, quando aplicadas a um conjunto de dados com informação pessoal recolhida a alunos que conluíram o ensino superior. No final, e uma vez feita a apresentação dos resultados, é feita uma análise e discussão comparativa dos resultados obtidos

    Balancing between data utility and privacy preservation in data mining

    Get PDF
    Data Mining plays a vital role in today‟s information world where it has been widely applied in various organizations. The current trend needs to share data for mutual benefit. However, there has been a lot of concern over privacy in the recent years .It has also raised a potential threat of revealing sensitive data of an individual when the data is released publically. Various methods have been proposed to tackle the privacy preservation problem like anonymization and perturbation. But the natural consequence of privacy preservation is information loss. The loss of specific information about certain individuals may affect the data quality and in extreme case the data may become completely useless. There are methods like cryptography which completely anonymize the dataset and which renders the dataset useless. So the utility of the data is completely lost. We need to protect the private information and preserve the data utility as much as possible. So the objective of the thesis is to find an optimum balance between privacy and utility while publishing dataset of any organization. Privacy preservation is hard requirement that must be satisfied and utility is the measure to be optimized. One of the methods for preserving privacy is K-anonymization which also preserves privacy to a good extent. K-anonymity demands that every tuple in the dataset released be indistinguishably related to no fewer than k respondents. We used K-means algorithm for clustering the dataset and followed by k-anonymization. Decision stump classification is used to determine utility and privacy is determined by firing random queries on the anonymized dataset. The balancing point is where the utility and privacy curves intersect or they tend to converge. The balancing point will vary from dataset to dataset and the choice of Quasi-identifier and sensitive attribute. For our experiment the balancing point is found to be around 50-60 percent which is the intersecting point of privacy and utility curves

    Protecting sensitive data using differential privacy and role-based access control

    Get PDF
    Dans le monde d'aujourd'hui où la plupart des aspects de la vie moderne sont traités par des systèmes informatiques, la vie privée est de plus en plus une grande préoccupation. En outre, les données ont été générées massivement et traitées en particulier dans les deux dernières années, ce qui motive les personnes et les organisations à externaliser leurs données massives à des environnements infonuagiques offerts par des fournisseurs de services. Ces environnements peuvent accomplir les tâches pour le stockage et l'analyse de données massives, car ils reposent principalement sur Hadoop MapReduce qui est conçu pour traiter efficacement des données massives en parallèle. Bien que l'externalisation de données massives dans le nuage facilite le traitement de données et réduit le coût de la maintenance et du stockage de données locales, elle soulève de nouveaux problèmes concernant la protection de la vie privée. Donc, comment on peut effectuer des calculs sur de données massives et sensibles tout en préservant la vie privée. Par conséquent, la construction de systèmes sécurisés pour la manipulation et le traitement de telles données privées et massives est cruciale. Nous avons besoin de mécanismes pour protéger les données privées, même lorsque le calcul en cours d'exécution est non sécurisé. Il y a eu plusieurs recherches ont porté sur la recherche de solutions aux problèmes de confidentialité et de sécurité lors de l'analyse de données dans les environnements infonuagique. Dans cette thèse, nous étudions quelques travaux existants pour protéger la vie privée de tout individu dans un ensemble de données, en particulier la notion de vie privée connue comme confidentialité différentielle. Confidentialité différentielle a été proposée afin de mieux protéger la vie privée du forage des données sensibles, assurant que le résultat global publié ne révèle rien sur la présence ou l'absence d'un individu donné. Enfin, nous proposons une idée de combiner confidentialité différentielle avec une autre méthode de préservation de la vie privée disponible.In nowadays world where most aspects of modern life are handled and managed by computer systems, privacy has increasingly become a big concern. In addition, data has been massively generated and processed especially over the last two years. The rate at which data is generated on one hand, and the need to efficiently store and analyze it on the other hand, lead people and organizations to outsource their massive amounts of data (namely Big Data) to cloud environments supported by cloud service providers (CSPs). Such environments can perfectly undertake the tasks for storing and analyzing big data since they mainly rely on Hadoop MapReduce framework, which is designed to efficiently handle big data in parallel. Although outsourcing big data into the cloud facilitates data processing and reduces the maintenance cost of local data storage, it raises new problem concerning privacy protection. The question is how one can perform computations on sensitive and big data while still preserving privacy. Therefore, building secure systems for handling and processing such private massive data is crucial. We need mechanisms to protect private data even when the running computation is untrusted. There have been several researches and work focused on finding solutions to the privacy and security issues for data analytics on cloud environments. In this dissertation, we study some existing work to protect the privacy of any individual in a data set, specifically a notion of privacy known as differential privacy. Differential privacy has been proposed to better protect the privacy of data mining over sensitive data, ensuring that the released aggregate result gives almost nothing about whether or not any given individual has been contributed to the data set. Finally, we propose an idea of combining differential privacy with another available privacy preserving method

    Privacy protection via anonymization for publishing multi-type data

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Usability heuristics for fast crime data anonymization in resource-constrained contexts

    Get PDF
    This thesis considers the case of mobile crime-reporting systems that have emerged as an effective and efficient data collection method in low and middle-income countries. Analyzing the data, can be helpful in addressing crime. Since law enforcement agencies in resource-constrained context typically do not have the expertise to handle these tasks, a cost-effective strategy is to outsource the data analytics tasks to third-party service providers. However, because of the sensitivity of the data, it is expedient to consider the issue of privacy. More specifically, this thesis considers the issue of finding low-intensive computational solutions to protecting the data even from an "honest-but-curious" service provider, while at the same time generating datasets that can be queried efficiently and reliably. This thesis offers a three-pronged solution approach. Firstly, the creation of a mobile application to facilitate crime reporting in a usable, secure and privacy-preserving manner. The second step proposes a streaming data anonymization algorithm, which analyses reported data based on occurrence rate rather than at a preset time on a static repository. Finally, in the third step the concept of using privacy preferences in creating anonymized datasets was considered. By taking into account user preferences the efficiency of the anonymization process is improved upon, which is beneficial in enabling fast data anonymization. Results from the prototype implementation and usability tests indicate that having a usable and covet crime-reporting application encourages users to declare crime occurrences. Anonymizing streaming data contributes to faster crime resolution times, and user privacy preferences are helpful in relaxing privacy constraints, which makes for more usable data from the querying perspective. This research presents considerable evidence that the concept of a three-pronged solution to addressing the issue of anonymity during crime reporting in a resource-constrained environment is promising. This solution can further assist the law enforcement agencies to partner with third party in deriving useful crime pattern knowledge without infringing on users' privacy. In the future, this research can be extended to more than one low-income or middle-income countries

    Understanding the Real World through the Analysis of User Behavior and Topics in Online Social Media

    Get PDF
    Physical events happening in the real world usually trigger reactions and discussions in the digital world; a world most often represented by Online Social Media such as Twitter or Facebook. Mining these reactions through social sensors offers a fast and low cost way to explain what is happening in the physical world. A thorough understanding of these discussions and the context behind them has become critical for many applications like business or political analysis. This context includes the characteristics of the population participating in a discussion, or when it is being discussed, or why. As an example, we demonstrate how the time of the day affects the prediction of traffic on highways through the analysis of social media content. Obtaining an understanding of what is happening online and the ramifications on the real world can be enabled through the automatic summarization of Social Media. Trending topics are offered as a high level content recommendation system where users are suggested to view related content if they deem the displayed topics interesting. However, identifying the characteristics of the users focused on each topic can boost the importance even for topics that might not be popular or bursty. We define a way to characterize groups of users that are focused in such topics and propose an efficient and accurate algorithm to extract such communities. Through qualitative and quantitative experimentation we observe that topics with a strong community focus are interesting and more likely to catch the attention of users.Consequently, as trending topic extraction algorithms become more sophisticated and report additional information like the characteristics of the users that participate in a trend, significant and novel privacy issues arise. We introduce a statistical attack to infer sensitive attribute values of Online Social Networks users that utilizes such reported community-aware trending topics. Additionally, we provide an algorithmic methodology that alters an existing community-aware trending topic algorithm so that it can preserve the privacy of the involved users while still reporting trending topics with a satisfactory level of utility. From the user’s perspective, we explore the idea of a cyborg that can constantly monitor its owner’s privacy and alert them when necessary. However, apart from individuals, the notion of privacy can also extend to a group of people (or community). We study how non-private behavior of individuals can lead to exposure of the identity of a larger group. This exposure poses certain dangers, like online harassment targeted to the members of a group, potential physical attacks, group identity shift, etc. We discuss how this new privacy notion can be modeled and identify a set of core challenges and potential solutions

    Contributions to privacy in web search engines

    Get PDF
    Els motors de cerca d’Internet recullen i emmagatzemen informació sobre els seus usuaris per tal d’oferir-los millors serveis. A canvi de rebre un servei personalitzat, els usuaris perden el control de les seves pròpies dades. Els registres de cerca poden revelar informació sensible de l’usuari, o fins i tot revelar la seva identitat. En aquesta tesis tractem com limitar aquests problemes de privadesa mentre mantenim suficient informació a les dades. La primera part d’aquesta tesis tracta els mètodes per prevenir la recollida d’informació per part dels motores de cerca. Ja que aquesta informació es requerida per oferir un servei precís, l’objectiu es proporcionar registres de cerca que siguin adequats per proporcionar personalització. Amb aquesta finalitat, proposem un protocol que empra una xarxa social per tal d’ofuscar els perfils dels usuaris. La segona part tracta la disseminació de registres de cerca. Proposem tècniques que la permeten, proporcionant k-anonimat i minimitzant la pèrdua d’informació.Web Search Engines collects and stores information about their users in order to tailor their services better to their users' needs. Nevertheless, while receiving a personalized attention, the users lose the control over their own data. Search logs can disclose sensitive information and the identities of the users, creating risks of privacy breaches. In this thesis we discuss the problem of limiting the disclosure risks while minimizing the information loss. The first part of this thesis focuses on the methods to prevent the gathering of information by WSEs. Since search logs are needed in order to receive an accurate service, the aim is to provide logs that are still suitable to provide personalization. We propose a protocol which uses a social network to obfuscate users' profiles. The second part deals with the dissemination of search logs. We propose microaggregation techniques which allow the publication of search logs, providing kk-anonymity while minimizing the information loss
    corecore