8,425 research outputs found
Garantia de privacidade na exploração de bases de dados distribuídas
Anonymisation is currently one of the biggest challenges when sharing sensitive
personal information. Its importance depends largely on the application
domain, but when dealing with health information, this becomes a more serious
issue. A simpler approach to avoid this disclosure is to ensure that all
data that can be associated directly with an individual is removed from the
original dataset. However, some studies have shown that simple anonymisation
procedures can sometimes be reverted using specific patients’ characteristics,
namely when the anonymisation is based on hidden key attributes.
In this work, we propose a secure architecture to share information from distributed
databases without compromising the subjects’ privacy. The work
was initially focused on identifying techniques to link information between
multiple data sources, in order to revert the anonymization procedures. In
a second phase, we developed the methodology to perform queries over
distributed databases was proposed. The architecture was validated using
a standard data schema that is widely adopted in observational research
studies.A garantia da anonimização de dados é atualmente um dos maiores desafios
quando existe a necessidade de partilhar informações pessoais de carácter
sensível. Apesar de ser um problema transversal a muitos domínios de
aplicação, este torna-se mais crítico quando a anonimização envolve dados
clinicos. Nestes casos, a abordagem mais comum para evitar a divulgação
de dados, que possam ser associados diretamente a um indivíduo, consiste
na remoção de atributos identificadores. No entanto, segundo a literatura,
esta abordagem não oferece uma garantia total de anonimato, que pode ser
quebrada através de ataques específicos que permitem a reidentificação dos
sujeitos.
Neste trabalho, é proposta uma arquitetura que permite partilhar dados
armazenados em repositórios distribuídos, de forma segura e sem comprometer
a privacidade. Numa primeira fase deste trabalho, foi feita uma análise
de técnicas que permitam reverter os procedimentos de anonimização. Na
fase seguinte, foi proposta uma metodologia que permite realizar pesquisas
em bases de dados distribuídas, sem que o anonimato seja quebrado. Esta
arquitetura foi validada sobre um esquema de base de dados relacional que
é amplamente utilizado em estudos clínicos observacionais.Mestrado em Ciberseguranç
From Social Data Mining to Forecasting Socio-Economic Crisis
Socio-economic data mining has a great potential in terms of gaining a better
understanding of problems that our economy and society are facing, such as
financial instability, shortages of resources, or conflicts. Without
large-scale data mining, progress in these areas seems hard or impossible.
Therefore, a suitable, distributed data mining infrastructure and research
centers should be built in Europe. It also appears appropriate to build a
network of Crisis Observatories. They can be imagined as laboratories devoted
to the gathering and processing of enormous volumes of data on both natural
systems such as the Earth and its ecosystem, as well as on human
techno-socio-economic systems, so as to gain early warnings of impending
events. Reality mining provides the chance to adapt more quickly and more
accurately to changing situations. Further opportunities arise by individually
customized services, which however should be provided in a privacy-respecting
way. This requires the development of novel ICT (such as a self- organizing
Web), but most likely new legal regulations and suitable institutions as well.
As long as such regulations are lacking on a world-wide scale, it is in the
public interest that scientists explore what can be done with the huge data
available. Big data do have the potential to change or even threaten democratic
societies. The same applies to sudden and large-scale failures of ICT systems.
Therefore, dealing with data must be done with a large degree of responsibility
and care. Self-interests of individuals, companies or institutions have limits,
where the public interest is affected, and public interest is not a sufficient
justification to violate human rights of individuals. Privacy is a high good,
as confidentiality is, and damaging it would have serious side effects for
society.Comment: 65 pages, 1 figure, Visioneer White Paper, see
http://www.visioneer.ethz.c
Privacy-Preserving Design of Data Processing Systems in the Public Transport Context
The public transport network of a region inhabited by more than 4 million people is run by a complex interplay of public and private actors. Large amounts of data are generated by travellers, buying and using various forms of tickets and passes. Analysing the data is of paramount importance for the governance and sustainability of the system. This manuscript reports the early results of the privacy analysis which is being undertaken as part of the analysis of the clearing process in the Emilia-Romagna region, in Italy, which will compute the compensations for tickets bought from one operator and used with another. In the manuscript it is shown by means of examples that the clearing data may be used to violate various privacy aspects regarding users, as well as (technically equivalent) trade secrets regarding operators. The ensuing discussion has a twofold goal. First, it shows that after researching possible existing solutions, both by reviewing the literature on general privacy-preserving techniques, and by analysing similar scenarios that are being discussed in various cities across the world, the former are found exhibiting structural effectiveness deficiencies, while the latter are found of limited applicability, typically involving less demanding requirements. Second, it traces a research path towards a more effective approach to privacy-preserving data management in the specific context of public transport, both by refinement of current sanitization techniques and by application of the privacy by design approach.
Available at: https://aisel.aisnet.org/pajais/vol7/iss4/4
The Challenges of Effectively Anonymizing Network Data
The availability of realistic network data plays a significant role in fostering collaboration and ensuring U.S. technical leadership in network security research. Unfortunately, a host of technical, legal, policy, and privacy issues limit the ability of operators to produce datasets for information security testing. In an effort to help overcome these limitations, several data collection efforts (e.g., CRAWDAD[14], PREDICT [34]) have been established in the past few years. The key principle used in all of these efforts to assure low-risk, high-value data is that of trace anonymization—the process of sanitizing data before release so that potentially sensitive information cannot be extracted
The impact of technology on data collection: Case studies in privacy and economics
Technological advancement can often act as a catalyst for scientific paradigm shifts. Today the ability to collect and process large amounts of data about individuals is arguably a paradigm-shift enabling technology in action. One manifestation of this technology within the sciences is the ability to study historically qualitative fields with a more granular quantitative lens than ever before. Despite the potential for this technology, wide-adoption is accompanied by some risks. In this thesis, I will present two case studies. The first, focuses on the impact of machine learning in a cheapest-wins motor insurance market by designing a competition-based data collection mechanism. Pricing models in the insurance industry are changing from statistical methods to machine learning. In this game, close to 2000 participants, acting as insurance companies, trained and submitted pricing models to compete for profit using real motor insurance policies --- with a roughly equal split between legacy and advanced models. With this trend towards machine learning in motion, preliminary analysis of the results suggest that future markets might realise cheaper prices for consumers. Additionally legacy models competing against modern algorithms, may experience a reduction in earning stability --- accelerating machine learning adoption. Overall, the results of this field experiment demonstrate the potential for digital competition-based studies of markets in the future. The second case studies the privacy risks of data collection technologies. Despite a large body of research in re-identification of anonymous data, the question remains: if a dataset was big enough, would records become anonymous by being "lost in the crowd"? Using 3 months of location data, we show that the risk of re-identification decreases slowly with dataset size. This risk is modelled and extrapolated to larger populations with 93% of people being uniquely identifiable using 4 points of auxiliary information among 60M people. These results show how the privacy of individuals is very unlikely to be preserved even in country-scale location datasets and that alternative paradigms of data sharing are still required.Open Acces
- …