9 research outputs found
Improved Technique for Preserving Privacy while Mining Real Time Big Data
With the evolution of Big data, data owners require the assistance of a third party (e.g.,cloud) to store, analyse the data and obtain information at a lower cost. However, maintaining privacy is a challenge in such scenarios. It may reveal sensitive information. The existing research discusses different techniques to implement privacy in original data using anonymization, randomization, and suppression techniques. But those techniques are not scalable, suffers from information loss, does not support real time data and hence not suitable for privacy preserving big data mining. In this research, a novel approach of two level privacy is proposed using pseudonymization and homomorphic encryption in spark framework. Several simulations are carried out on the collected dataset. Through the results obtained, we observed that execution time is reduced by 50%, privacy is enhanced by 10%. This scheme is suitable for both privacy preserving Big Data publishing and mining
Privacy Preserving, Protection of Personal Data, and Big Data: a Review of the Colombia Case
Big Data promises great socially accepted and desirable benefits. However, in general terms, the datification of life has made
people to lose some awareness of the risks involved in the massive analysis of data regarding their fundamental rights. This fact
is used by the companies involved in the data value chain to maximize their benefits although this implies the proliferation of
negative externalities assumed by the information holders. The Colombian State has made great efforts regarding the protection
of data and privacy, as demonstrated by Law 1266 of 2008 and Law 1581 of 2012, nevertheless, a deep literary review leads to
conclude the need to adapt to the international contex
Big data y privacidad. Estudio bibliométrico
Revisión bibliográfica sobre la Privacidad de los datos personales en la actividad relacionada con el concepto “Big Data”.Universidad de Sevilla. Máster Universitario en Estudios Avanzados en Dirección de Empresa
Sistema de anonimización de datos estructurados
Las aproximaciones más empleadas en la industria para proteger los datos privados implican
deteriorar su utilidad para los ejercicios de analítica. Por ello, este trabajo propone Anonylitics,
un sistema para la anonimización de datos estructurados, que se fundamenta en la preservación
de la distribución de los datos numéricos, al mismo tiempo que se garantiza su privacidad. La
propuesta realizada permite seguir teniendo información útil para la analítica de datos a nivel
empresarial, lo cual es evidenciado a través de la validación efectuada mediante la anonimización
de dos conjuntos de datos reales que demuestran el potencial del sistema y sus algoritmos.The most used approaches in the industry to protect private data imply to impair its utility for
analytical exercises. For this reason, this work proposes Anonylitics, a system for the anonymization
of structured data, which is based on the preservation of the distribution of numerical
data, at the same time that their privacy is guaranteed. The proposal makes it possible to continue
having useful information for business data analytics, which is evidenced through the
validation carried out by anonymizing two sets of real data that demonstrate the potential of
the system and its algorithms.Magíster en Ingeniería de Sistemas y ComputaciónMaestrí
Scalable and approximate privacy-preserving record linkage
Record linkage, the task of linking multiple databases with the aim to identify records
that refer to the same entity, is occurring increasingly in many application areas.
Generally, unique entity identifiers are not available in all the databases to be linked.
Therefore, record linkage requires the use of personal identifying attributes, such as
names and addresses, to identify matching records that need to be reconciled to the
same entity. Often, it is not permissible to exchange personal identifying data across
different organizations due to privacy and confidentiality concerns or regulations.
This has led to the novel research area of privacy-preserving record linkage (PPRL).
PPRL addresses the problem of how to link different databases to identify records
that correspond to the same real-world entities, without revealing the identities of
these entities or any private or confidential information to any party involved in the process, or to any external party, such as a researcher. The three key challenges that a PPRL solution in a real-world context needs to address are (1) scalability to largedatabases by efficiently conducting linkage; (2) achieving high quality of linkage through the use of approximate (string) matching and effective classification of the compared record pairs into matches (i.e. pairs of records that refer to the same entity) and non-matches (i.e. pairs of records that refer to different entities); and (3) provision
of sufficient privacy guarantees such that the interested parties only learn the actual
values of certain attributes of the records that were classified as matches, and the
process is secure with regard to any internal or external adversary.
In this thesis, we present extensive research in PPRL, where we have addressed
several gaps and problems identified in existing PPRL approaches. First, we begin
the thesis with a review of the literature and we propose a taxonomy of PPRL to characterize existing techniques. This allows us to identify gaps and research directions.
In the remainder of the thesis, we address several of the identified shortcomings.
One main shortcoming we address is a framework for empirical and comparative
evaluation of different PPRL solutions, which has not been studied in the literature
so far. Second, we propose several novel algorithms for scalable and approximate
PPRL by addressing the three main challenges of PPRL. We propose efficient private
blocking techniques, for both three-party and two-party scenarios, based on sorted
neighborhood clustering to address the scalability challenge. Following, we propose
two efficient two-party techniques for private matching and classification to address the linkage quality challenge in terms of approximate matching and effective classification. Privacy is addressed in these approaches using efficient data perturbation techniques including k-anonymous mapping, reference values, and Bloom filters.
Finally, the thesis reports on an extensive comparative evaluation of our proposed
solutions with several other state-of-the-art techniques on real-world datasets, which
shows that our solutions outperform others in terms of all three key challenges
Secure Protocols for Privacy-preserving Data Outsourcing, Integration, and Auditing
As the amount of data available from a wide range of domains has increased tremendously in recent years, the demand for data sharing and integration has also risen. The cloud computing paradigm provides great flexibility to data owners with respect to computation and storage capabilities, which makes it a suitable platform for them to share their data. Outsourcing person-specific data to the cloud, however, imposes serious concerns about the confidentiality of the outsourced data, the privacy of the individuals referenced in the data, as well as the confidentiality of the queries processed over the data. Data integration is another form of data sharing, where data owners jointly perform the integration process, and the resulting dataset is shared between them. Integrating related data from different sources enables individuals, businesses, organizations and government agencies to perform better data analysis, make better informed decisions, and provide better services. Designing distributed, secure, and privacy-preserving protocols for integrating person-specific data, however, poses several challenges, including how to prevent each party from inferring sensitive information about individuals during the execution of the protocol, how to guarantee an effective level of privacy on the released data while maintaining utility for data mining, and how to support public auditing such that anyone at any time can verify that the integration was executed correctly and no participants deviated from the protocol.
In this thesis, we address the aforementioned concerns by presenting secure protocols for privacy-preserving data outsourcing, integration and auditing. First, we propose a secure cloud-based data outsourcing and query processing framework that simultaneously preserves the confidentiality of the data and the query requests, while providing differential privacy guarantees on the query results. Second, we propose a publicly verifiable protocol for integrating person-specific data from multiple data owners, while providing differential privacy guarantees and maintaining an effective level of utility on the released data for the purpose of data mining. Next, we propose a privacy-preserving multi-party protocol for high-dimensional data mashup with guaranteed LKC-privacy on the output data.
Finally, we apply the theory to the real world problem of solvency in Bitcoin. More specifically, we propose a privacy-preserving and publicly verifiable cryptographic proof of solvency scheme for Bitcoin exchanges such that no information is revealed about the exchange's customer holdings, the value of the exchange's total holdings is kept secret, and multiple exchanges performing the same proof of solvency can contemporaneously prove they are not colluding