Search CORE

5 research outputs found

How will anonymization of simulated clinical data affect the data utility of pharmacoepidemiological studies?

Author: Cheang Chi Kei
Publication venue: 'UiT The Arctic University of Norway'
Publication date: 14/05/2019
Field of study

Background: The pressure to share more data and being more transparency of clinical study reports has grown and becomes an important topic in recent years. Before clinical data and clinical results can be shared they must undergo anonymization. How anonymization of clinical data affects the utility is poorly-studied, especially in pharmacoepidemiology. Objective: The aim of the study is to describe and evaluate how anonymization of simulated clinical data will affect the data utility of pharmacoepidemiological analyses of these data. Method: We have simulated five clinical datasets with different characteristics, associations, types of outcome and study populations. Suppression, generalization, randomization and k-anonymity were used as our anonymization approaches. These methods will be evaluated by the change in the data and statistical results before and after anonymization. Result: K-anonymity and suppression were the methods that affected the simulated clinical data the most, while generalization and randomization affected the data least. With k-anonymity and suppression there is a risk to overestimating the clinical results due to the elimination of unique records. On the other hand, generalization and randomization preserved the most data utility but they were less effective in anonymizing the data. Conclusion: Our study revealed that different anonymization approaches can affect the clinical results differently. The more we anonymize a record or attribute, the less utility is provided. It is therefore important to construct a balance of data utility and effectiveness of anonymization before the clinical data are published. More investigations about how anonymization of clinical data affects data utility are needed in order to maximize the benefit of using anonymized clinical data to improve public health

Munin - Open Research Archive

Privacy enhancing technologies (PETs) for connected vehicles in smart cities

This is an accepted manuscript of an article published by Wiley in Transactions on Emerging Telecommunications Technologies, available online: https://doi.org/10.1002/ett.4173 The accepted version of the publication may differ from the final published version.Many Experts believe that the Internet of Things (IoT) is a new revolution in technology that has brought many benefits for our organizations, businesses, and industries. However, information security and privacy protection are important challenges particularly for smart vehicles in smart cities that have attracted the attention of experts in this domain. Privacy Enhancing Technologies (PETs) endeavor to mitigate the risk of privacy invasions, but the literature lacks a thorough review of the approaches and techniques that support individuals' privacy in the connection between smart vehicles and smart cities. This gap has stimulated us to conduct this research with the main goal of reviewing recent privacy-enhancing technologies, approaches, taxonomy, challenges, and solutions on the application of PETs for smart vehicles in smart cities. The significant aspect of this study originates from the inclusion of data-oriented and process-oriented privacy protection. This research also identifies limitations of existing PETs, complementary technologies, and potential research directions.Published onlin

Crossref

E-space: Manchester Metropolitan University's Research Repository

Warwick Research Archives Portal Repository

Coventry University Pure Portal

Wolverhampton Intellectual Repository and E-theses

Cloud based privacy preserving data mining model using hybrid k-anonymity and partial homomorphic encryption

Author: Mansour Osman Huda Osman
Publication venue
Publication date: 01/01/2022
Field of study

The evolution of information and communication technologies have encourage numerous organizations to outsource their business and data to cloud computing to perform data mining and other data processing operations. Despite the great benefits of the cloud, it has a real problem in the security and privacy of data. Many studies explained that attackers often reveal the information from third-party services or third-party clouds. When a data owners outsource their data to the cloud, especially the SaaS cloud model, it is difficult to preserve the confidentiality and integrity of the data. Privacy-Preserving Data Mining (PPDM) aims to accomplish data mining operations while protecting the owner's data from violation. The current models of PPDM have some limitations. That is, they suffer from data disclosure caused by identity and attributes disclosure where some private information is revealed which causes the success of different types of attacks. Besides, existing solutions have poor data utility and high computational performance overhead. Therefore, this research aims to design and develop Hybrid Anonymization Cryptography PPDM (HAC-PPDM) model to improve the privacy-preserving level by reducing data disclosure before outsourcing data for mining over the cloud while maintaining data utility. The proposed HAC-PPDM model is further aimed reducing the computational performance overhead to improve efficiency. The Quasi-Identifiers Recognition algorithm (QIR) is defined and designed depending on attributes classification and Quasi-Identifiers dimension determine to overcome the identity disclosure caused by Quasi-Identifiers linking to reduce privacy leakage. An Enhanced Homomorphic Scheme is designed based on hybridizing Cloud-RSA encryption scheme, Extended Euclidean algorithm (EE), Fast Modular Exponentiation algorithm (FME), and Chinese Remainder Theorem (CRT) to minimize the computational time complexity while reducing the attribute disclosure. The proposed QIR, Enhanced Homomorphic Scheme and k-anonymity privacy model have been hybridized to obtain optimal data privacy-preservation before outsourced it on the cloud while maintaining the utility of data that meets the needs of mining with good efficiency. Real-world datasets have been used to evaluate the proposed algorithms and model. The experimental results show that the proposed QIR algorithm improved the data privacy-preserving percentage by 23% while maintaining the same or slightly better data utility. Meanwhile, the proposed Enhanced Homomorphic Scheme is more efficient comparing to the related works in terms of time complexity as represented by Big O notation. Moreover, it reduced the computational time of the encryption, decryption, and key generation time. Finally, the proposed HAC-PPDM model successfully reduced the data disclosures and improved the privacy-preserving level while preserved the data utility as it reduced the information loss. In short, it achieved improvement of privacy preserving and data mining (classification) accuracy by 7.59 % and 0.11 % respectively

Universiti Teknologi Malaysia Institutional Repository

Méthode de construction d'entrepôt de données temporalisé pour un système informationnel de santé

Author: Khnaisser Christina
Publication venue: 'Universite de Sherbrooke'
Publication date: 01/01/2016
Field of study

Des systèmes informationnels de santé (SIS) ont été mis en place au cours des 20 dernières années pour soutenir les processus de soins, les tâches administratives et les activités de recherche ainsi que pour assurer la gestion raisonnée des établissements de santé. Un entrepôt de données (ED) doit être créé à partir de nombreuses sources de données hétérogènes afin de rendre les données exploitables d’une façon uniforme au sein des SIS. La temporalisation de cet entrepôt est rapidement devenue un enjeu crucial afin de garder les traces de l’évolution des données et d’améliorer la prise de décision clinique. L’entrepôt de données temporalisé (EDT) requiert l’application de règles systématiques afin de garantir l’intégrité et la qualité des données. Générer le schéma temporel d’un EDT est une tâche complexe. Plusieurs questions se posent dès lors, dont celles-ci : (a) Quel modèle temporel est le mieux adapté à l’automatisation de la construction d’un EDT (plus particulièrement dans le domaine de la santé)? (b) Quelles propriétés peut-on garantir formellement, suite à cette construction? D’une part, le volume du schéma de données nécessite d’importantes ressources humaines et financières, et d’autre part, plusieurs modèles temporels existent, mais ils ne sont pas formalisés ou non généraux. Les concepteurs s’en remettent donc le plus souvent à des règles de pratiques variées, floues, incomplètes et non validées. Dans ce travail, un cadre de référence permettant de formaliser, de généraliser et d’opérationnaliser des modèles temporels est défini. Deux modèles : BCDM et TRM sont présentés selon le cadre de référence avec leurs contraintes d’intégrité, leurs algorithmes de construction et une liste des prolongements requis. Il en résulte qu’il est désormais possible de s’affranchir des règles de pratique imprécises et de temporaliser un entrepôt en se fondant sur une méthode rigoureuse aux propriétés démontrables basées sur des critères fondamentaux (théorie relationnelle), des critères de conception reconnus et explicites (normalisation)

Savoirs UdeS

A flexible approach to distributed data anonymization

Author: Barrio
Bellare
Byun
Cambon-Thomsen
Claudia Eckert
Dalenius
Emam
Fabian Prasser
Florian Kohlmayer
Fung
Gionis
Goldberger
Goldreich
Heeney
Jagannathan
Jiang
Kantarcioglu
Kaye
Kissner
Klaus A. Kuhn
LeFevre
LeFevre
Li
Malin
Malin
Mohammed
Mohammed
Nergiz
Nergiz
Payne
Samarati
Schneier
Sweeney
Tassa
Wagstaff
Wang
Wong
Xiao
Zhang
Zhong
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref