448 research outputs found
Privacy Preservation by Disassociation
In this work, we focus on protection against identity disclosure in the
publication of sparse multidimensional data. Existing multidimensional
anonymization techniquesa) protect the privacy of users either by altering the
set of quasi-identifiers of the original data (e.g., by generalization or
suppression) or by adding noise (e.g., using differential privacy) and/or (b)
assume a clear distinction between sensitive and non-sensitive information and
sever the possible linkage. In many real world applications the above
techniques are not applicable. For instance, consider web search query logs.
Suppressing or generalizing anonymization methods would remove the most
valuable information in the dataset: the original query terms. Additionally,
web search query logs contain millions of query terms which cannot be
categorized as sensitive or non-sensitive since a term may be sensitive for a
user and non-sensitive for another. Motivated by this observation, we propose
an anonymization technique termed disassociation that preserves the original
terms but hides the fact that two or more different terms appear in the same
record. We protect the users' privacy by disassociating record terms that
participate in identifying combinations. This way the adversary cannot
associate with high probability a record with a rare combination of terms. To
the best of our knowledge, our proposal is the first to employ such a technique
to provide protection against identity disclosure. We propose an anonymization
algorithm based on our approach and evaluate its performance on real and
synthetic datasets, comparing it against other state-of-the-art methods based
on generalization and differential privacy.Comment: VLDB201
Privacy in data publishing for tailored recommendation scenarios
Personal information is increasingly gathered and used for providing services tailored to user preferences, but the datasets used to provide such functionality can represent serious privacy threats if not appropriately protected. Work in privacy-preserving data publishing targeted privacy guarantees that protect against record re-identification, by making records indistinguishable, or sensitive attribute value disclosure, by introducing diversity or noise in the sensitive values. However, most approaches fail in the high-dimensional case, and the ones that don’t introduce a utility cost incompatible with tailored recommendation scenarios. This paper aims at a sensible trade-off between privacy and the benefits of tailored recommendations, in the context of privacy-preserving data publishing. We empirically demonstrate that significant privacy improvements can be achieved at a utility cost compatible with tailored recommendation scenarios, using a simple partition-based sanitization method
ρ-uncertainty Anonymization by Partial Suppression
Abstract. We present a novel framework for set-valued data anonymiza-tion by partial suppression regardless of the amount of background knowl-edge the attacker possesses, and can be adapted to both space-time and quality-time trade-offs in a “pay-as-you-go ” approach. While minimizing the number of item deletions, the framework attempts to either preserve the original data distribution or retain mineable useful association rules, which targets statistical analysis and association mining, two major data mining applications on set-valued data.
Semantic attack on anonymised transaction data
Publishing data about individuals is a double-edged sword; it can provide a significant
benefit for a range of organisations to help understand issues concerning individuals,
and improve services they offer. However, it can also represent a serious threat to individuals’ privacy. To overcome these threats, researchers have worked on developing
anonymisation methods. However, the anonymisation methods do not take into consideration the semantic relationships and meaning of data, which can be exploited by
attackers to expose protected data.
In our work, we study a specific anonymisation method called disassociation and investigate if it provides adequate protection for transaction data. The disassociation
method hides sensitive links between transaction’s items by dividing them into chunks.
We propose a de-anonymisation approach to attacking transaction data anonymised by
the disassociated data. The approach exploits the semantic relationships between transaction items to reassociate them.
Our findings reveal that the disassociation method may not effectively protect transaction data. Our de-anonymisation approach can recombine approximately 60% of the
disassociated items and can break the privacy of nearly 70% of the protected itemets
in disassociated transactions
Recommended from our members
Human Mobility Monitoring using WiFi: Analysis, Modeling, and Applications
Understanding and modeling humans and device mobility has fundamental importance in mobile computing, with implications ranging from network design and location-aware technologies to urban infrastructure planning. Today\u27s users carry a plethora of devices such as smartphones, laptops, tablets, and smartwatches, with each device offering a different set of services resulting in different usage and mobility leading to the research question of understanding and modeling multiple user device trajectories. Additionally, prior research on mobility focuses on outdoor mobility when it is known that users spend 80% of their time indoors resulting in wide gaps in knowledge in the area of indoor mobility of users and devices. Here, I try to fill the gaps in mobility modeling in the areas of understanding and modeling indoor-outdoor human mobility as well as multi-device mobility. In this thesis, I propose the characterization and modeling of human and device mobility. Further, I design and deploy mobility-aware applications for contact tracing of infectious diseases and energy-aware Heating, Ventilation, and Air Conditioning (HVAC) scheduling. I try and answer a sequence of four primary inter-related questions : (1) how is indoor and outdoor user mobility different, (2) are multiple device trajectories belonging to a single user correlated, (3) how to model indoor mobility of users and (4) how to design effective mobility aware applications that are easily deployable and align with long term goals of sustainability as well relay positive societal impact. The insights gained from each question serves as a base to build up on the next question in the series. I present answers to these questions across three main parts of my thesis. The first part comprises of characterization and analysis of human and device mobility. In this part I design and develop tool to extract device trajectories from WiFi system logs syslog and map devices to users. These extracted trajectories and device to user mapping are used to characterize and empirically analyze the mobility of users at varying spatial granularity (indoor, outdoor) and extract device mobility correlations between multiple devices of users and forms the first part of my thesis. In the second part, based on the insights gained from the multi-granular and multi-device mobility characterization stated above, I argue that mobility is inherently hierarchical in nature and propose novel indoor human mobility modeling approach. Third, I leverage the passively observed mobility to design mobility-aware applications that either look back or look ahead in time. WiFiTrace is a look back or backtracking application that is a network-centric contact tracing tool to aid healthcare workers in manual contact tracing of infectious diseases and iSchedule is a look ahead machine learning based mobility-aware energy-saving application that predicts Heating, Ventilation, and Air Conditioning (HVAC) schedule for higher energy savings while increasing user comfort
Stochastic Modeling and Inference of Large-scale Gene Regulatory Networks
Gene regulatory networks (GRNs) consist of thousands of genes and proteins
which are dynamically interacting with each other. Researchers have investigated
how to uncover these unknown interactions by observing expressions of biological
molecules with various statistical/mathematical methods. Once these regulatory
structures are revealed, it is necessary to understand their dynamical behaviors
since pathway activities could be changed by their given conditions. Therefore,
both the regulatory structure estimation and dynamics modeling of GRNs are essential for biological research.
Generally, GRN dynamics are usually investigated via stochastic models since
molecular interactions are basically discrete and stochastic processes. However,
this stochastic nature requires heavy simulation time to find the steady-state solution of the GRNs where thousands of genes are involved. This large number of
genes also causes difficulties such as dimensionality problem in estimating their
regulatory structure.
This thesis mainly focuses on developing methodologies for large-scale GRN
analyses. It includes applications of a stochastic process theory called G-networks
and a reverse engineering technique for large-scale GRNs. Additionally a series
of bioinformatics techniques was applied in brain tumor data to detect disease
candidate genes along with their large-scale GRNs.
The proposed techniques such as stochastic modeling (bottom-up) and reverse
engineering (top-down) could provide a systematic view of a complex system and
an efficient guideline to identify candidate genes or pathways triggering a specific
phenotype of a cell. As further work, the combinatorial use of the modeling and
reverse engineering approaches would be helpful in obtaining a reliable mathematical model and even in developing a synthetic biological system
Large-scale Wireless Local-area Network Measurement and Privacy Analysis
The edge of the Internet is increasingly becoming wireless. Understanding the wireless edge is therefore important for understanding the performance and security aspects of the Internet experience. This need is especially necessary for enterprise-wide wireless local-area networks (WLANs) as organizations increasingly depend on WLANs for mission- critical tasks. To study a live production WLAN, especially a large-scale network, is a difficult undertaking. Two fundamental difficulties involved are (1) building a scalable network measurement infrastructure to collect traces from a large-scale production WLAN, and (2) preserving user privacy while sharing these collected traces to the network research community. In this dissertation, we present our experience in designing and implementing one of the largest distributed WLAN measurement systems in the United States, the Dartmouth Internet Security Testbed (DIST), with a particular focus on our solutions to the challenges of efficiency, scalability, and security. We also present an extensive evaluation of the DIST system. To understand the severity of some potential trace-sharing risks for an enterprise-wide large-scale wireless network, we conduct privacy analysis on one kind of wireless network traces, a user-association log, collected from a large-scale WLAN. We introduce a machine-learning based approach that can extract and quantify sensitive information from a user-association log, even though it is sanitized. Finally, we present a case study that evaluates the tradeoff between utility and privacy on WLAN trace sanitization
Privacidade em comunicações de dados para ambientes contextualizados
Doutoramento em InformáticaInternet users consume online targeted advertising based on information collected
about them and voluntarily share personal information in social networks.
Sensor information and data from smart-phones is collected and used
by applications, sometimes in unclear ways. As it happens today with smartphones,
in the near future sensors will be shipped in all types of connected
devices, enabling ubiquitous information gathering from the physical environment,
enabling the vision of Ambient Intelligence. The value of gathered data,
if not obvious, can be harnessed through data mining techniques and put to
use by enabling personalized and tailored services as well as business intelligence
practices, fueling the digital economy.
However, the ever-expanding information gathering and use undermines the
privacy conceptions of the past. Natural social practices of managing privacy
in daily relations are overridden by socially-awkward communication tools, service
providers struggle with security issues resulting in harmful data leaks,
governments use mass surveillance techniques, the incentives of the digital
economy threaten consumer privacy, and the advancement of consumergrade
data-gathering technology enables new inter-personal abuses.
A wide range of fields attempts to address technology-related privacy problems,
however they vary immensely in terms of assumptions, scope and approach.
Privacy of future use cases is typically handled vertically, instead
of building upon previous work that can be re-contextualized, while current
privacy problems are typically addressed per type in a more focused way.
Because significant effort was required to make sense of the relations and
structure of privacy-related work, this thesis attempts to transmit a structured
view of it. It is multi-disciplinary - from cryptography to economics, including
distributed systems and information theory - and addresses privacy issues of
different natures.
As existing work is framed and discussed, the contributions to the state-of-theart
done in the scope of this thesis are presented. The contributions add to
five distinct areas: 1) identity in distributed systems; 2) future context-aware
services; 3) event-based context management; 4) low-latency information flow
control; 5) high-dimensional dataset anonymity. Finally, having laid out such
landscape of the privacy-preserving work, the current and future privacy challenges
are discussed, considering not only technical but also socio-economic
perspectives.Quem usa a Internet vê publicidade direccionada com base nos seus hábitos
de navegação, e provavelmente partilha voluntariamente informação pessoal
em redes sociais. A informação disponível nos novos telemóveis é amplamente
acedida e utilizada por aplicações móveis, por vezes sem razões claras
para isso. Tal como acontece hoje com os telemóveis, no futuro muitos tipos
de dispositivos elecónicos incluirão sensores que permitirão captar dados do
ambiente, possibilitando o surgimento de ambientes inteligentes. O valor dos
dados captados, se não for óbvio, pode ser derivado através de técnicas de
análise de dados e usado para fornecer serviços personalizados e definir estratégias
de negócio, fomentando a economia digital.
No entanto estas práticas de recolha de informação criam novas questões de
privacidade. As práticas naturais de relações inter-pessoais são dificultadas
por novos meios de comunicação que não as contemplam, os problemas de
segurança de informação sucedem-se, os estados vigiam os seus cidadãos,
a economia digital leva á monitorização dos consumidores, e as capacidades
de captação e gravação dos novos dispositivos eletrónicos podem ser usadas
abusivamente pelos próprios utilizadores contra outras pessoas.
Um grande número de áreas científicas focam problemas de privacidade relacionados
com tecnologia, no entanto fazem-no de maneiras diferentes e
assumindo pontos de partida distintos. A privacidade de novos cenários é
tipicamente tratada verticalmente, em vez de re-contextualizar trabalho existente,
enquanto os problemas actuais são tratados de uma forma mais focada.
Devido a este fraccionamento no trabalho existente, um exercício muito relevante
foi a sua estruturação no âmbito desta tese. O trabalho identificado é
multi-disciplinar - da criptografia à economia, incluindo sistemas distribuídos
e teoria da informação - e trata de problemas de privacidade de naturezas
diferentes.
À medida que o trabalho existente é apresentado, as contribuições feitas por
esta tese são discutidas. Estas enquadram-se em cinco áreas distintas: 1)
identidade em sistemas distribuídos; 2) serviços contextualizados; 3) gestão
orientada a eventos de informação de contexto; 4) controlo de fluxo de
informação com latência baixa; 5) bases de dados de recomendação anónimas.
Tendo descrito o trabalho existente em privacidade, os desafios actuais
e futuros da privacidade são discutidos considerando também perspectivas
socio-económicas
- …