5 research outputs found
How will anonymization of simulated clinical data affect the data utility of pharmacoepidemiological studies?
Background: The pressure to share more data and being more transparency of clinical study reports has grown and becomes an important topic in recent years. Before clinical data and clinical results can be shared they must undergo anonymization. How anonymization of clinical data affects the utility is poorly-studied, especially in pharmacoepidemiology.
Objective: The aim of the study is to describe and evaluate how anonymization of simulated clinical data will affect the data utility of pharmacoepidemiological analyses of these data.
Method: We have simulated five clinical datasets with different characteristics, associations, types of outcome and study populations. Suppression, generalization, randomization and k-anonymity were used as our anonymization approaches. These methods will be evaluated by the change in the data and statistical results before and after anonymization.
Result: K-anonymity and suppression were the methods that affected the simulated clinical data the most, while generalization and randomization affected the data least. With k-anonymity and suppression there is a risk to overestimating the clinical results due to the elimination of unique records. On the other hand, generalization and randomization preserved the most data utility but they were less effective in anonymizing the data.
Conclusion: Our study revealed that different anonymization approaches can affect the clinical results differently. The more we anonymize a record or attribute, the less utility is provided. It is therefore important to construct a balance of data utility and effectiveness of anonymization before the clinical data are published. More investigations about how anonymization of clinical data affects data utility are needed in order to maximize the benefit of using anonymized clinical data to improve public health
Privacy enhancing technologies (PETs) for connected vehicles in smart cities
This is an accepted manuscript of an article published by Wiley in Transactions on Emerging Telecommunications Technologies, available online: https://doi.org/10.1002/ett.4173
The accepted version of the publication may differ from the final published version.Many Experts believe that the Internet of Things (IoT) is a new revolution in technology that has brought many benefits for our organizations, businesses, and industries. However, information security and privacy protection are important challenges particularly for smart vehicles in smart cities that have attracted the attention of experts in this domain. Privacy Enhancing Technologies (PETs) endeavor to mitigate the risk of privacy invasions, but the literature lacks a thorough review of the approaches and techniques that support individuals' privacy in the connection between smart vehicles and smart cities. This gap has stimulated us to conduct this research with the main goal of reviewing recent privacy-enhancing technologies, approaches, taxonomy, challenges, and solutions on the application of PETs for smart vehicles in smart cities. The significant aspect of this study originates from the inclusion of data-oriented and process-oriented privacy protection. This research also identifies limitations of existing PETs, complementary technologies, and potential research directions.Published onlin
Cloud based privacy preserving data mining model using hybrid k-anonymity and partial homomorphic encryption
The evolution of information and communication technologies have encourage numerous organizations to outsource their business and data to cloud computing to perform data mining and other data processing operations. Despite the great benefits of the cloud, it has a real problem in the security and privacy of data. Many studies explained that attackers often reveal the information from third-party services or third-party clouds. When a data owners outsource their data to the cloud, especially the SaaS cloud model, it is difficult to preserve the confidentiality and integrity of the data. Privacy-Preserving Data Mining (PPDM) aims to accomplish data mining operations while protecting the owner's data from violation. The current models of PPDM have some limitations. That is, they suffer from data disclosure caused by identity and attributes disclosure where some private information is revealed which causes the success of different types of attacks. Besides, existing solutions have poor data utility and high computational performance overhead. Therefore, this research aims to design and develop Hybrid Anonymization Cryptography PPDM (HAC-PPDM) model to improve the privacy-preserving level by reducing data disclosure before outsourcing data for mining over the cloud while maintaining data utility. The proposed HAC-PPDM model is further aimed reducing the computational performance overhead to improve efficiency. The Quasi-Identifiers Recognition algorithm (QIR) is defined and designed depending on attributes classification and Quasi-Identifiers dimension determine to overcome the identity disclosure caused by Quasi-Identifiers linking to reduce privacy leakage. An Enhanced Homomorphic Scheme is designed based on hybridizing Cloud-RSA encryption scheme, Extended Euclidean algorithm (EE), Fast Modular Exponentiation algorithm (FME), and Chinese Remainder Theorem (CRT) to minimize the computational time complexity while reducing the attribute disclosure. The proposed QIR, Enhanced Homomorphic Scheme and k-anonymity privacy model have been hybridized to obtain optimal data privacy-preservation before outsourced it on the cloud while maintaining the utility of data that meets the needs of mining with good efficiency. Real-world datasets have been used to evaluate the proposed algorithms and model. The experimental results show that the proposed QIR algorithm improved the data privacy-preserving percentage by 23% while maintaining the same or slightly better data utility. Meanwhile, the proposed Enhanced Homomorphic Scheme is more efficient comparing to the related works in terms of time complexity as represented by Big O notation. Moreover, it reduced the computational time of the encryption, decryption, and key generation time. Finally, the proposed HAC-PPDM model successfully reduced the data disclosures and improved the privacy-preserving level while preserved the data utility as it reduced the information loss. In short, it achieved improvement of privacy preserving and data mining (classification) accuracy by 7.59 % and 0.11 % respectively
Méthode de construction d'entrepÎt de données temporalisé pour un systÚme informationnel de santé
Des systÚmes informationnels de santé (SIS) ont été mis en place au cours des 20 derniÚres
années pour soutenir les processus de soins, les tùches administratives et les activités de
recherche ainsi que pour assurer la gestion raisonnée des établissements de santé. Un entrepÎt
de donnĂ©es (ED) doit ĂȘtre crĂ©Ă© Ă partir de nombreuses sources de donnĂ©es hĂ©tĂ©rogĂšnes afin de
rendre les donnĂ©es exploitables dâune façon uniforme au sein des SIS. La temporalisation de
cet entrepĂŽt est rapidement devenue un enjeu crucial afin de garder les traces de lâĂ©volution
des donnĂ©es et dâamĂ©liorer la prise de dĂ©cision clinique. LâentrepĂŽt de donnĂ©es temporalisĂ©
(EDT) requiert lâapplication de rĂšgles systĂ©matiques afin de garantir lâintĂ©gritĂ© et la qualitĂ©
des donnĂ©es. GĂ©nĂ©rer le schĂ©ma temporel dâun EDT est une tĂąche complexe. Plusieurs
questions se posent dĂšs lors, dont celles-ci : (a) Quel modĂšle temporel est le mieux adaptĂ© Ă
lâautomatisation de la construction dâun EDT (plus particuliĂšrement dans le domaine de la
santé)? (b) Quelles propriétés peut-on garantir formellement, suite à cette construction?
Dâune part, le volume du schĂ©ma de donnĂ©es nĂ©cessite dâimportantes ressources humaines et
financiĂšres, et dâautre part, plusieurs modĂšles temporels existent, mais ils ne sont pas
formalisĂ©s ou non gĂ©nĂ©raux. Les concepteurs sâen remettent donc le plus souvent Ă des rĂšgles
de pratiques variées, floues, incomplÚtes et non validées. Dans ce travail, un cadre de
rĂ©fĂ©rence permettant de formaliser, de gĂ©nĂ©raliser et dâopĂ©rationnaliser des modĂšles
temporels est défini. Deux modÚles : BCDM et TRM sont présentés selon le cadre de
rĂ©fĂ©rence avec leurs contraintes dâintĂ©gritĂ©, leurs algorithmes de construction et une liste des
prolongements requis. Il en rĂ©sulte quâil est dĂ©sormais possible de sâaffranchir des rĂšgles de
pratique imprécises et de temporaliser un entrepÎt en se fondant sur une méthode rigoureuse
aux propriétés démontrables basées sur des critÚres fondamentaux (théorie relationnelle), des
critĂšres de conception reconnus et explicites (normalisation)