448 research outputs found

    Privacy Preservation by Disassociation

    Full text link
    In this work, we focus on protection against identity disclosure in the publication of sparse multidimensional data. Existing multidimensional anonymization techniquesa) protect the privacy of users either by altering the set of quasi-identifiers of the original data (e.g., by generalization or suppression) or by adding noise (e.g., using differential privacy) and/or (b) assume a clear distinction between sensitive and non-sensitive information and sever the possible linkage. In many real world applications the above techniques are not applicable. For instance, consider web search query logs. Suppressing or generalizing anonymization methods would remove the most valuable information in the dataset: the original query terms. Additionally, web search query logs contain millions of query terms which cannot be categorized as sensitive or non-sensitive since a term may be sensitive for a user and non-sensitive for another. Motivated by this observation, we propose an anonymization technique termed disassociation that preserves the original terms but hides the fact that two or more different terms appear in the same record. We protect the users' privacy by disassociating record terms that participate in identifying combinations. This way the adversary cannot associate with high probability a record with a rare combination of terms. To the best of our knowledge, our proposal is the first to employ such a technique to provide protection against identity disclosure. We propose an anonymization algorithm based on our approach and evaluate its performance on real and synthetic datasets, comparing it against other state-of-the-art methods based on generalization and differential privacy.Comment: VLDB201

    Privacy in data publishing for tailored recommendation scenarios

    Get PDF
    Personal information is increasingly gathered and used for providing services tailored to user preferences, but the datasets used to provide such functionality can represent serious privacy threats if not appropriately protected. Work in privacy-preserving data publishing targeted privacy guarantees that protect against record re-identification, by making records indistinguishable, or sensitive attribute value disclosure, by introducing diversity or noise in the sensitive values. However, most approaches fail in the high-dimensional case, and the ones that don’t introduce a utility cost incompatible with tailored recommendation scenarios. This paper aims at a sensible trade-off between privacy and the benefits of tailored recommendations, in the context of privacy-preserving data publishing. We empirically demonstrate that significant privacy improvements can be achieved at a utility cost compatible with tailored recommendation scenarios, using a simple partition-based sanitization method

    ρ-uncertainty Anonymization by Partial Suppression

    Full text link
    Abstract. We present a novel framework for set-valued data anonymiza-tion by partial suppression regardless of the amount of background knowl-edge the attacker possesses, and can be adapted to both space-time and quality-time trade-offs in a “pay-as-you-go ” approach. While minimizing the number of item deletions, the framework attempts to either preserve the original data distribution or retain mineable useful association rules, which targets statistical analysis and association mining, two major data mining applications on set-valued data.

    Semantic attack on anonymised transaction data

    Get PDF
    Publishing data about individuals is a double-edged sword; it can provide a significant benefit for a range of organisations to help understand issues concerning individuals, and improve services they offer. However, it can also represent a serious threat to individuals’ privacy. To overcome these threats, researchers have worked on developing anonymisation methods. However, the anonymisation methods do not take into consideration the semantic relationships and meaning of data, which can be exploited by attackers to expose protected data. In our work, we study a specific anonymisation method called disassociation and investigate if it provides adequate protection for transaction data. The disassociation method hides sensitive links between transaction’s items by dividing them into chunks. We propose a de-anonymisation approach to attacking transaction data anonymised by the disassociated data. The approach exploits the semantic relationships between transaction items to reassociate them. Our findings reveal that the disassociation method may not effectively protect transaction data. Our de-anonymisation approach can recombine approximately 60% of the disassociated items and can break the privacy of nearly 70% of the protected itemets in disassociated transactions

    Stochastic Modeling and Inference of Large-scale Gene Regulatory Networks

    Get PDF
    Gene regulatory networks (GRNs) consist of thousands of genes and proteins which are dynamically interacting with each other. Researchers have investigated how to uncover these unknown interactions by observing expressions of biological molecules with various statistical/mathematical methods. Once these regulatory structures are revealed, it is necessary to understand their dynamical behaviors since pathway activities could be changed by their given conditions. Therefore, both the regulatory structure estimation and dynamics modeling of GRNs are essential for biological research. Generally, GRN dynamics are usually investigated via stochastic models since molecular interactions are basically discrete and stochastic processes. However, this stochastic nature requires heavy simulation time to find the steady-state solution of the GRNs where thousands of genes are involved. This large number of genes also causes difficulties such as dimensionality problem in estimating their regulatory structure. This thesis mainly focuses on developing methodologies for large-scale GRN analyses. It includes applications of a stochastic process theory called G-networks and a reverse engineering technique for large-scale GRNs. Additionally a series of bioinformatics techniques was applied in brain tumor data to detect disease candidate genes along with their large-scale GRNs. The proposed techniques such as stochastic modeling (bottom-up) and reverse engineering (top-down) could provide a systematic view of a complex system and an efficient guideline to identify candidate genes or pathways triggering a specific phenotype of a cell. As further work, the combinatorial use of the modeling and reverse engineering approaches would be helpful in obtaining a reliable mathematical model and even in developing a synthetic biological system

    Large-scale Wireless Local-area Network Measurement and Privacy Analysis

    Get PDF
    The edge of the Internet is increasingly becoming wireless. Understanding the wireless edge is therefore important for understanding the performance and security aspects of the Internet experience. This need is especially necessary for enterprise-wide wireless local-area networks (WLANs) as organizations increasingly depend on WLANs for mission- critical tasks. To study a live production WLAN, especially a large-scale network, is a difficult undertaking. Two fundamental difficulties involved are (1) building a scalable network measurement infrastructure to collect traces from a large-scale production WLAN, and (2) preserving user privacy while sharing these collected traces to the network research community. In this dissertation, we present our experience in designing and implementing one of the largest distributed WLAN measurement systems in the United States, the Dartmouth Internet Security Testbed (DIST), with a particular focus on our solutions to the challenges of efficiency, scalability, and security. We also present an extensive evaluation of the DIST system. To understand the severity of some potential trace-sharing risks for an enterprise-wide large-scale wireless network, we conduct privacy analysis on one kind of wireless network traces, a user-association log, collected from a large-scale WLAN. We introduce a machine-learning based approach that can extract and quantify sensitive information from a user-association log, even though it is sanitized. Finally, we present a case study that evaluates the tradeoff between utility and privacy on WLAN trace sanitization

    Privacidade em comunicações de dados para ambientes contextualizados

    Get PDF
    Doutoramento em InformáticaInternet users consume online targeted advertising based on information collected about them and voluntarily share personal information in social networks. Sensor information and data from smart-phones is collected and used by applications, sometimes in unclear ways. As it happens today with smartphones, in the near future sensors will be shipped in all types of connected devices, enabling ubiquitous information gathering from the physical environment, enabling the vision of Ambient Intelligence. The value of gathered data, if not obvious, can be harnessed through data mining techniques and put to use by enabling personalized and tailored services as well as business intelligence practices, fueling the digital economy. However, the ever-expanding information gathering and use undermines the privacy conceptions of the past. Natural social practices of managing privacy in daily relations are overridden by socially-awkward communication tools, service providers struggle with security issues resulting in harmful data leaks, governments use mass surveillance techniques, the incentives of the digital economy threaten consumer privacy, and the advancement of consumergrade data-gathering technology enables new inter-personal abuses. A wide range of fields attempts to address technology-related privacy problems, however they vary immensely in terms of assumptions, scope and approach. Privacy of future use cases is typically handled vertically, instead of building upon previous work that can be re-contextualized, while current privacy problems are typically addressed per type in a more focused way. Because significant effort was required to make sense of the relations and structure of privacy-related work, this thesis attempts to transmit a structured view of it. It is multi-disciplinary - from cryptography to economics, including distributed systems and information theory - and addresses privacy issues of different natures. As existing work is framed and discussed, the contributions to the state-of-theart done in the scope of this thesis are presented. The contributions add to five distinct areas: 1) identity in distributed systems; 2) future context-aware services; 3) event-based context management; 4) low-latency information flow control; 5) high-dimensional dataset anonymity. Finally, having laid out such landscape of the privacy-preserving work, the current and future privacy challenges are discussed, considering not only technical but also socio-economic perspectives.Quem usa a Internet vê publicidade direccionada com base nos seus hábitos de navegação, e provavelmente partilha voluntariamente informação pessoal em redes sociais. A informação disponível nos novos telemóveis é amplamente acedida e utilizada por aplicações móveis, por vezes sem razões claras para isso. Tal como acontece hoje com os telemóveis, no futuro muitos tipos de dispositivos elecónicos incluirão sensores que permitirão captar dados do ambiente, possibilitando o surgimento de ambientes inteligentes. O valor dos dados captados, se não for óbvio, pode ser derivado através de técnicas de análise de dados e usado para fornecer serviços personalizados e definir estratégias de negócio, fomentando a economia digital. No entanto estas práticas de recolha de informação criam novas questões de privacidade. As práticas naturais de relações inter-pessoais são dificultadas por novos meios de comunicação que não as contemplam, os problemas de segurança de informação sucedem-se, os estados vigiam os seus cidadãos, a economia digital leva á monitorização dos consumidores, e as capacidades de captação e gravação dos novos dispositivos eletrónicos podem ser usadas abusivamente pelos próprios utilizadores contra outras pessoas. Um grande número de áreas científicas focam problemas de privacidade relacionados com tecnologia, no entanto fazem-no de maneiras diferentes e assumindo pontos de partida distintos. A privacidade de novos cenários é tipicamente tratada verticalmente, em vez de re-contextualizar trabalho existente, enquanto os problemas actuais são tratados de uma forma mais focada. Devido a este fraccionamento no trabalho existente, um exercício muito relevante foi a sua estruturação no âmbito desta tese. O trabalho identificado é multi-disciplinar - da criptografia à economia, incluindo sistemas distribuídos e teoria da informação - e trata de problemas de privacidade de naturezas diferentes. À medida que o trabalho existente é apresentado, as contribuições feitas por esta tese são discutidas. Estas enquadram-se em cinco áreas distintas: 1) identidade em sistemas distribuídos; 2) serviços contextualizados; 3) gestão orientada a eventos de informação de contexto; 4) controlo de fluxo de informação com latência baixa; 5) bases de dados de recomendação anónimas. Tendo descrito o trabalho existente em privacidade, os desafios actuais e futuros da privacidade são discutidos considerando também perspectivas socio-económicas
    corecore