11,762 research outputs found
User's Privacy in Recommendation Systems Applying Online Social Network Data, A Survey and Taxonomy
Recommender systems have become an integral part of many social networks and
extract knowledge from a user's personal and sensitive data both explicitly,
with the user's knowledge, and implicitly. This trend has created major privacy
concerns as users are mostly unaware of what data and how much data is being
used and how securely it is used. In this context, several works have been done
to address privacy concerns for usage in online social network data and by
recommender systems. This paper surveys the main privacy concerns, measurements
and privacy-preserving techniques used in large-scale online social networks
and recommender systems. It is based on historical works on security,
privacy-preserving, statistical modeling, and datasets to provide an overview
of the technical difficulties and problems associated with privacy preserving
in online social networks.Comment: 26 pages, IET book chapter on big data recommender system
Privacy-Preserving Data Integration for Health
The digital transformation of health processes has resulted in the collection of vast amounts of health-related data that presents significant potential to support medical research projects and improve the healthcare system. Many of these possibilities arise as a consequence of integrating data from different sources to create an accurate and unified representation of the underlying data and enable detailed data analysis that is not possible through any individual source. Achieving this vision requires the collection and processing of sensitive health-related data about individuals, thus privacy and confidentiality implications have to be considered. In this paper, I describe my doctoral research topic: the design and development of a novel Privacy-Preserving Data Integration (PPDI) framework which aims to effectively address the challenges and opportunities of integrating Big Health Data (BHD) while ensuring compliance with the General Data Protection Regulation (GDPR). The paper describes the planned methodology for implementing the PPDI process through the usage of data pseudonymization techniques and Privacy-Preserving Record Linkage (PPRL) methods and provides an overview of the new framework, which is based on the re-implementation of MOMIS towards a microservices architecture with added PPDI functionalities
Misusability Measure Based Sanitization of Big Data for Privacy Preserving MapReduce Programming
Leakage and misuse of sensitive data is a challenging problem to enterprises. It has become more serious problem with the advent of cloud and big data. The rationale behind this is the increase in outsourcing of data to public cloud and publishing data for wider visibility. Therefore Privacy Preserving Data Publishing (PPDP), Privacy Preserving Data Mining (PPDM) and Privacy Preserving Distributed Data Mining (PPDM) are crucial in the contemporary era. PPDP and PPDM can protect privacy at data and process levels respectively. Therefore, with big data privacy to data became indispensable due to the fact that data is stored and processed in semi-trusted environment. In this paper we proposed a comprehensive methodology for effective sanitization of data based on misusability measure for preserving privacy to get rid of data leakage and misuse. We followed a hybrid approach that caters to the needs of privacy preserving MapReduce programming. We proposed an algorithm known as Misusability Measure-Based Privacy serving Algorithm (MMPP) which considers level of misusability prior to choosing and application of appropriate sanitization on big data. Our empirical study with Amazon EC2 and EMR revealed that the proposed methodology is useful in realizing privacy preserving Map Reduce programming
Privacy in the Genomic Era
Genome sequencing technology has advanced at a rapid pace and it is now
possible to generate highly-detailed genotypes inexpensively. The collection
and analysis of such data has the potential to support various applications,
including personalized medical services. While the benefits of the genomics
revolution are trumpeted by the biomedical community, the increased
availability of such data has major implications for personal privacy; notably
because the genome has certain essential features, which include (but are not
limited to) (i) an association with traits and certain diseases, (ii)
identification capability (e.g., forensics), and (iii) revelation of family
relationships. Moreover, direct-to-consumer DNA testing increases the
likelihood that genome data will be made available in less regulated
environments, such as the Internet and for-profit companies. The problem of
genome data privacy thus resides at the crossroads of computer science,
medicine, and public policy. While the computer scientists have addressed data
privacy for various data types, there has been less attention dedicated to
genomic data. Thus, the goal of this paper is to provide a systematization of
knowledge for the computer science community. In doing so, we address some of
the (sometimes erroneous) beliefs of this field and we report on a survey we
conducted about genome data privacy with biomedical specialists. Then, after
characterizing the genome privacy problem, we review the state-of-the-art
regarding privacy attacks on genomic data and strategies for mitigating such
attacks, as well as contextualizing these attacks from the perspective of
medicine and public policy. This paper concludes with an enumeration of the
challenges for genome data privacy and presents a framework to systematize the
analysis of threats and the design of countermeasures as the field moves
forward
Supporting Regularized Logistic Regression Privately and Efficiently
As one of the most popular statistical and machine learning models, logistic
regression with regularization has found wide adoption in biomedicine, social
sciences, information technology, and so on. These domains often involve data
of human subjects that are contingent upon strict privacy regulations.
Increasing concerns over data privacy make it more and more difficult to
coordinate and conduct large-scale collaborative studies, which typically rely
on cross-institution data sharing and joint analysis. Our work here focuses on
safeguarding regularized logistic regression, a widely-used machine learning
model in various disciplines while at the same time has not been investigated
from a data security and privacy perspective. We consider a common use scenario
of multi-institution collaborative studies, such as in the form of research
consortia or networks as widely seen in genetics, epidemiology, social
sciences, etc. To make our privacy-enhancing solution practical, we demonstrate
a non-conventional and computationally efficient method leveraging distributing
computing and strong cryptography to provide comprehensive protection over
individual-level and summary data. Extensive empirical evaluation on several
studies validated the privacy guarantees, efficiency and scalability of our
proposal. We also discuss the practical implications of our solution for
large-scale studies and applications from various disciplines, including
genetic and biomedical studies, smart grid, network analysis, etc
Garantia de privacidade na exploração de bases de dados distribuídas
Anonymisation is currently one of the biggest challenges when sharing sensitive
personal information. Its importance depends largely on the application
domain, but when dealing with health information, this becomes a more serious
issue. A simpler approach to avoid this disclosure is to ensure that all
data that can be associated directly with an individual is removed from the
original dataset. However, some studies have shown that simple anonymisation
procedures can sometimes be reverted using specific patients’ characteristics,
namely when the anonymisation is based on hidden key attributes.
In this work, we propose a secure architecture to share information from distributed
databases without compromising the subjects’ privacy. The work
was initially focused on identifying techniques to link information between
multiple data sources, in order to revert the anonymization procedures. In
a second phase, we developed the methodology to perform queries over
distributed databases was proposed. The architecture was validated using
a standard data schema that is widely adopted in observational research
studies.A garantia da anonimização de dados é atualmente um dos maiores desafios
quando existe a necessidade de partilhar informações pessoais de carácter
sensível. Apesar de ser um problema transversal a muitos domínios de
aplicação, este torna-se mais crítico quando a anonimização envolve dados
clinicos. Nestes casos, a abordagem mais comum para evitar a divulgação
de dados, que possam ser associados diretamente a um indivíduo, consiste
na remoção de atributos identificadores. No entanto, segundo a literatura,
esta abordagem não oferece uma garantia total de anonimato, que pode ser
quebrada através de ataques específicos que permitem a reidentificação dos
sujeitos.
Neste trabalho, é proposta uma arquitetura que permite partilhar dados
armazenados em repositórios distribuídos, de forma segura e sem comprometer
a privacidade. Numa primeira fase deste trabalho, foi feita uma análise
de técnicas que permitam reverter os procedimentos de anonimização. Na
fase seguinte, foi proposta uma metodologia que permite realizar pesquisas
em bases de dados distribuídas, sem que o anonimato seja quebrado. Esta
arquitetura foi validada sobre um esquema de base de dados relacional que
é amplamente utilizado em estudos clínicos observacionais.Mestrado em Ciberseguranç
- …