72 research outputs found

    Privacy preserving association rule mining using attribute-identity mapping

    Get PDF
    Association rule mining uncovers hidden yet important patterns in data. Discovery of the patterns helps data owners to make right decision to enhance efficiency, increase profit and reduce loss. However, there is privacy concern especially when the data owner is not the miner or when many parties are involved. This research studied privacy preserving association rule mining (PPARM) of horizontally partitioned and outsourced data. Existing research works in the area concentrated mainly on the privacy issue and paid very little attention to data quality issue. Meanwhile, the more the data quality, the more accurate and reliable will the association rules be. Consequently, this research proposed Attribute-Identity Mapping (AIM) as a PPARM technique to address the data quality issue. Given a dataset, AIM identifies set of attributes, attribute values for each attribute. It then assigns ‘unique’ identity for each of the attributes and their corresponding values. It then generates sanitized dataset by replacing each attribute and its values with their corresponding identities. For privacy preservation purpose, the sanitization process will be carried out by data owners. They then send the sanitized data, which is made up of only identities, to data miner. When any or all the data owners need(s) ARM result from the aggregate data, they send query to the data miner. The query constitutes attributes (in form of identities), minSup and minConf thresholds and then number of rules they are want. Results obtained show that the PPARM technique maintains 100% data quality without compromising privacy, using Census Income dataset

    Privacy Preserving Data Mining using Jaya based Genetic Algorithm

    Get PDF
    Privacy protection has emerged as a key concern in the field of data mining because of the rise in the sharing of sensitive data across networks among organizations, governments, and other parties. For   knowledge extraction from these huge set of data, association rule mining is used to analyze the patterns of data. For a variety of optimization issues encountered in the real world, evolutionary algorithms (EAs) offer efficient solutions. The available EA solutions in the privacy-preserving area are limited to specific issues like cost function evaluation. Here, a JAYA based Genetic algorithm has been proposed for privacy preservation.“JAYA” means victory a word from Sanskrit origin..This algorithm doesn't need any parameters that are specific to it and it moves towards best solution avoiding the worst. Hence the name Jaya. Jaya based genetic algorithm is applied to original dataset of chromosomes. Privacy preservation is achieved by comparing the support of original dataset of chromosome with the simulation output

    Heterogeneous differential privacy for vertically partitioned databases

    Full text link
    © 2019 John Wiley & Sons, Ltd. Existing privacy-preserving approaches are generally designed to provide privacy guarantee for individual data in a database, which reduces the utility of the database for data analysis. In this paper, we propose a novel differential privacy mechanism to preserve the heterogeneous privacy of a vertically partitioned database based on attributes. We first present the concept of privacy label, which characterizes the privacy information of the database and is instantiated by the classification. Then, we use an information-based method to systematically explore the dependencies between all attributes and the privacy label. We finally assign privacy weights to every attribute and design a heterogeneous mechanism according to the basic Laplace mechanism. Evaluations using real datasets demonstrate that the proposed mechanism achieves a balanced privacy and utility

    A survey of state-of-the-art methods for securing medical databases

    Get PDF
    This review article presents a survey of recent work devoted to advanced state-of-the-art methods for securing of medical databases. We concentrate on three main directions, which have received attention recently: attribute-based encryption for enabling secure access to confidential medical databases distributed among several data centers; homomorphic encryption for providing answers to confidential queries in a secure manner; and privacy-preserving data mining used to analyze data stored in medical databases for verifying hypotheses and discovering trends. Only the most recent and significant work has been included

    Identity, location and query privacy for smart devices

    Full text link
    In this thesis, we have discussed three important aspects of users\u27 privacy namely, location privacy, identity privacy and query privacy. The information related to identity, location and query is very sensitive as it can reveal behavior patterns, interests, preferences and habits of the users. We have proposed several techniques in the thesis on how to better protect the identity, location and query privacy

    Business clients´segmentation based on activity : a banking approach

    Get PDF
    Internship report presented as partial requirement for obtaining the Master’s degree in Information Management, with a specialization in knowledge Management and Business IntelligenceClustering algorithms are frequently used by companies to segment their customers in order to develop accurate marketing strategies. The K-means is one of the most popular algorithms, despite its drawbacks in terms of seeds’ generation. In this study, several clustering algorithms were tested but in the end the K-means initialized with random seeds was used to segment the data due to its better performance. This B2B segmentation resulted in four clusters based on the activity patterns of each business client, The Loyals, The Minglers, The Challengers and The Believers. Each one of these clusters shows a different type of relationship with the bank, being the bank the first choice for The Loyals and for the Believers but not for the others

    Equally contributory privacy-preserving k-means clustering over vertically partitioned data

    No full text
    In recent years, there have been numerous attempts to extend the k-means clustering protocol for single database to a distributed multiple database setting and meanwhile keep privacy of each data site. Current solutions for (whether two or more) multiparty k-means clustering, built on one or more secure two-party computation algorithms, are not equally contributory, in other words, each party does not equally contribute to k-means clustering. This may lead a perfidious attack where a party who learns the outcome prior to other parties tells a lie of the outcome to other parties. In this paper, we present an equally contributory multiparty k-means clustering protocol for vertically partitioned data, in which each party equally contributes to k-means clustering. Our protocol is built on ElGamal's encryption scheme, Jakobsson and Juels's plaintext equivalence test protocol, and mix networks, and protects privacy in terms that each iteration of k-means clustering can be performed without revealing the intermediate values

    Privacy-preserving data analytics in cloud computing

    Get PDF
    The evolution of digital content and rapid expansion of data sources has raised the need for streamlined monitoring, collection, storage and analysis of massive, heterogeneous data to extract useful knowledge and support decision-making mechanisms. In this context, cloud computing o↵ers extensive, cost-e↵ective and on demand computing resources that improve the quality of services for users and also help service providers (enterprises, governments and individuals). Service providers can avoid the expense of acquiring and maintaining IT resources while migrating data and remotely managing processes including aggregation, monitoring and analysis in cloud servers. However, privacy and security concerns of cloud computing services, especially in storing sensitive data (e.g. personal, healthcare and financial) are major challenges to the adoption of these services. To overcome such barriers, several privacy-preserving techniques have been developed to protect outsourced data in the cloud. Cryptography is a well-known mechanism that can ensure data confidentiality in the cloud. Traditional cryptography techniques have the ability to protect the data through encryption in cloud servers and data owners can retrieve and decrypt data for their processing purposes. However, in this case, cloud users can use the cloud resources for data storage but they cannot take full advantage of cloud-based processing services. This raises the need to develop advanced cryptosystems that can protect data privacy, both while in storage and in processing in the cloud. Homomorphic Encryption (HE) has gained attention recently because it can preserve the privacy of data while it is stored and processed in the cloud servers and data owners can retrieve and decrypt their processed data to their own secure side. Therefore, HE o↵ers an end-to-end security mechanism that is a preferable feature in cloud-based applications. In this thesis, we developed innovative privacy-preserving cloud-based models based on HE cryptosystems. This allowed us to build secure and advanced analytic models in various fields. We began by designing and implementing a secure analytic cloud-based model based on a lightweight HE cryptosystem. We used a private resident cloud entity, called ”privacy manager”, as an intermediate communication server between data owners and public cloud servers. The privacy manager handles analytical tasks that cannot be accomplished by the lightweight HE cryptosystem. This model is convenient for several application domains that require real-time responses. Data owners delegate their processing tasks to the privacy manager, which then helps to automate analysis tasks without the need to interact with data owners. We then developed a comprehensive, secure analytical model based on a Fully Homomorphic Encryption (FHE), that has more computational capability than the lightweight HE. Although FHE can automate analysis tasks and avoid the use of the privacy manager entity, it also leads to massive computational overhead. To overcome this issue, we took the advantage of the massive cloud resources by designing a MapReduce model that massively parallelises HE analytical tasks. Our parallelisation approach significantly speeds up the performance of analysis computations based on FHE. We then considered distributed analytic models where the data is generated from distributed heterogeneous sources such as healthcare and industrial sensors that are attached to people or installed in a distributed-based manner. We developed a secure distributed analytic model by re-designing several analytic algorithms (centroid-based and distribution-based clustering) to adapt them into a secure distributed-based models based on FHE. Our distributed analytic model was developed not only for distributed-based applications, but also it eliminates FHE overhead obstacle by achieving high efficiency in FHE computations. Furthermore, the distributed approach is scalable across three factors: analysis accuracy, execution time and the amount of resources used. This scalability feature enables users to consider the requirements of their analysis tasks based on these factors (e.g. users may have limited resources or time constrains to accomplish their analysis tasks). Finally, we designed and implemented two privacy-preserving real-time cloud-based applications to demonstrate the capabilities of HE cryptosystems, in terms of both efficiency and computational capabilities for applications that require timely and reliable delivery of services. First, we developed a secure cloud-based billing model for a sensor-enabled smart grid infrastructure by using lightweight HE. This model handled billing analysis tasks for individual users in a secure manner without the need to interact with any trusted parties. Second, we built a real-time secure health surveillance model for smarter health communities in the cloud. We developed a secure change detection model based on an exponential smoothing technique to predict future changes in health vital signs based on FHE. Moreover, we built an innovative technique to parallelise FHE computations which significantly reduces computational overhead

    Exécutions de requêtes respectueuses de la vie privée par utilisation de composants matériels sécurisés

    Get PDF
    Current applications, from complex sensor systems (e.g. quantified self) to online e-markets acquire vast quantities of personal information which usually end-up on central servers. This massive amount of personal data, the new oil, represents an unprecedented potential for applications and business. However, centralizing and processing all one's data in a single server, where they are exposed to prying eyes, poses a major problem with regards to privacy concern.Conversely, decentralized architectures helping individuals keep full control of their data, but they complexify global treatments and queries, impeding the development of innovative services.In this thesis, we aim at reconciling individual's privacy on one side and global benefits for the community and business perspectives on the other side. It promotes the idea of pushing the security to secure hardware devices controlling the data at the place of their acquisition. Thanks to these tangible physical elements of trust, secure distributed querying protocols can reestablish the capacity to perform global computations, such as SQL aggregates, without revealing any sensitive information to central servers.This thesis studies the subset of SQL queries without external joins and shows how to secure their execution in the presence of honest-but-curious attackers. It also discusses how the resulting querying protocols can be integrated in a concrete decentralized architecture. Cost models and experiments on SQL/AA, our distributed prototype running on real tamper-resistant hardware, demonstrate that this approach can scale to nationwide applications.Les applications actuelles, des systèmes de capteurs complexes (par exemple auto quantifiée) aux applications de e-commerce, acquièrent de grandes quantités d'informations personnelles qui sont habituellement stockées sur des serveurs centraux. Cette quantité massive de données personnelles, considéré comme le nouveau pétrole, représente un important potentiel pour les applications et les entreprises. Cependant, la centralisation et le traitement de toutes les données sur un serveur unique, où elles sont exposées aux indiscrétions de son gestionnaire, posent un problème majeur en ce qui concerne la vie privée.Inversement, les architectures décentralisées aident les individus à conserver le plein de contrôle sur leurs données, toutefois leurs traitements en particulier le calcul de requêtes globales deviennent complexes.Dans cette thèse, nous visons à concilier la vie privée de l'individu et l'exploitation de ces données, qui présentent des avantages manifestes pour la communauté (comme des études statistiques) ou encore des perspectives d'affaires. Nous promouvons l'idée de sécuriser l'acquisition des données par l'utilisation de matériel sécurisé. Grâce à ces éléments matériels tangibles de confiance, sécuriser des protocoles d'interrogation distribués permet d'effectuer des calculs globaux, tels que les agrégats SQL, sans révéler d'informations sensibles à des serveurs centraux.Cette thèse étudie le sous-groupe de requêtes SQL sans jointures et montre comment sécuriser leur exécution en présence d'attaquants honnêtes-mais-curieux. Cette thèse explique également comment les protocoles d'interrogation qui en résultent peuvent être intégrés concrètement dans une architecture décentralisée. Nous démontrons que notre approche est viable et peut passer à l'échelle d'applications de la taille d'un pays par un modèle de coût et des expériences réelles sur notre prototype, SQL/AA

    LIPIcs, Volume 277, GIScience 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 277, GIScience 2023, Complete Volum
    corecore