5 research outputs found

    How will anonymization of simulated clinical data affect the data utility of pharmacoepidemiological studies?

    Get PDF
    Background: The pressure to share more data and being more transparency of clinical study reports has grown and becomes an important topic in recent years. Before clinical data and clinical results can be shared they must undergo anonymization. How anonymization of clinical data affects the utility is poorly-studied, especially in pharmacoepidemiology. Objective: The aim of the study is to describe and evaluate how anonymization of simulated clinical data will affect the data utility of pharmacoepidemiological analyses of these data. Method: We have simulated five clinical datasets with different characteristics, associations, types of outcome and study populations. Suppression, generalization, randomization and k-anonymity were used as our anonymization approaches. These methods will be evaluated by the change in the data and statistical results before and after anonymization. Result: K-anonymity and suppression were the methods that affected the simulated clinical data the most, while generalization and randomization affected the data least. With k-anonymity and suppression there is a risk to overestimating the clinical results due to the elimination of unique records. On the other hand, generalization and randomization preserved the most data utility but they were less effective in anonymizing the data. Conclusion: Our study revealed that different anonymization approaches can affect the clinical results differently. The more we anonymize a record or attribute, the less utility is provided. It is therefore important to construct a balance of data utility and effectiveness of anonymization before the clinical data are published. More investigations about how anonymization of clinical data affects data utility are needed in order to maximize the benefit of using anonymized clinical data to improve public health

    Privacy enhancing technologies (PETs) for connected vehicles in smart cities

    Get PDF
    This is an accepted manuscript of an article published by Wiley in Transactions on Emerging Telecommunications Technologies, available online: https://doi.org/10.1002/ett.4173 The accepted version of the publication may differ from the final published version.Many Experts believe that the Internet of Things (IoT) is a new revolution in technology that has brought many benefits for our organizations, businesses, and industries. However, information security and privacy protection are important challenges particularly for smart vehicles in smart cities that have attracted the attention of experts in this domain. Privacy Enhancing Technologies (PETs) endeavor to mitigate the risk of privacy invasions, but the literature lacks a thorough review of the approaches and techniques that support individuals' privacy in the connection between smart vehicles and smart cities. This gap has stimulated us to conduct this research with the main goal of reviewing recent privacy-enhancing technologies, approaches, taxonomy, challenges, and solutions on the application of PETs for smart vehicles in smart cities. The significant aspect of this study originates from the inclusion of data-oriented and process-oriented privacy protection. This research also identifies limitations of existing PETs, complementary technologies, and potential research directions.Published onlin

    Cloud based privacy preserving data mining model using hybrid k-anonymity and partial homomorphic encryption

    Get PDF
    The evolution of information and communication technologies have encourage numerous organizations to outsource their business and data to cloud computing to perform data mining and other data processing operations. Despite the great benefits of the cloud, it has a real problem in the security and privacy of data. Many studies explained that attackers often reveal the information from third-party services or third-party clouds. When a data owners outsource their data to the cloud, especially the SaaS cloud model, it is difficult to preserve the confidentiality and integrity of the data. Privacy-Preserving Data Mining (PPDM) aims to accomplish data mining operations while protecting the owner's data from violation. The current models of PPDM have some limitations. That is, they suffer from data disclosure caused by identity and attributes disclosure where some private information is revealed which causes the success of different types of attacks. Besides, existing solutions have poor data utility and high computational performance overhead. Therefore, this research aims to design and develop Hybrid Anonymization Cryptography PPDM (HAC-PPDM) model to improve the privacy-preserving level by reducing data disclosure before outsourcing data for mining over the cloud while maintaining data utility. The proposed HAC-PPDM model is further aimed reducing the computational performance overhead to improve efficiency. The Quasi-Identifiers Recognition algorithm (QIR) is defined and designed depending on attributes classification and Quasi-Identifiers dimension determine to overcome the identity disclosure caused by Quasi-Identifiers linking to reduce privacy leakage. An Enhanced Homomorphic Scheme is designed based on hybridizing Cloud-RSA encryption scheme, Extended Euclidean algorithm (EE), Fast Modular Exponentiation algorithm (FME), and Chinese Remainder Theorem (CRT) to minimize the computational time complexity while reducing the attribute disclosure. The proposed QIR, Enhanced Homomorphic Scheme and k-anonymity privacy model have been hybridized to obtain optimal data privacy-preservation before outsourced it on the cloud while maintaining the utility of data that meets the needs of mining with good efficiency. Real-world datasets have been used to evaluate the proposed algorithms and model. The experimental results show that the proposed QIR algorithm improved the data privacy-preserving percentage by 23% while maintaining the same or slightly better data utility. Meanwhile, the proposed Enhanced Homomorphic Scheme is more efficient comparing to the related works in terms of time complexity as represented by Big O notation. Moreover, it reduced the computational time of the encryption, decryption, and key generation time. Finally, the proposed HAC-PPDM model successfully reduced the data disclosures and improved the privacy-preserving level while preserved the data utility as it reduced the information loss. In short, it achieved improvement of privacy preserving and data mining (classification) accuracy by 7.59 % and 0.11 % respectively

    Méthode de construction d'entrepÎt de données temporalisé pour un systÚme informationnel de santé

    Get PDF
    Des systĂšmes informationnels de santĂ© (SIS) ont Ă©tĂ© mis en place au cours des 20 derniĂšres annĂ©es pour soutenir les processus de soins, les tĂąches administratives et les activitĂ©s de recherche ainsi que pour assurer la gestion raisonnĂ©e des Ă©tablissements de santĂ©. Un entrepĂŽt de donnĂ©es (ED) doit ĂȘtre crĂ©Ă© Ă  partir de nombreuses sources de donnĂ©es hĂ©tĂ©rogĂšnes afin de rendre les donnĂ©es exploitables d’une façon uniforme au sein des SIS. La temporalisation de cet entrepĂŽt est rapidement devenue un enjeu crucial afin de garder les traces de l’évolution des donnĂ©es et d’amĂ©liorer la prise de dĂ©cision clinique. L’entrepĂŽt de donnĂ©es temporalisĂ© (EDT) requiert l’application de rĂšgles systĂ©matiques afin de garantir l’intĂ©gritĂ© et la qualitĂ© des donnĂ©es. GĂ©nĂ©rer le schĂ©ma temporel d’un EDT est une tĂąche complexe. Plusieurs questions se posent dĂšs lors, dont celles-ci : (a) Quel modĂšle temporel est le mieux adaptĂ© Ă  l’automatisation de la construction d’un EDT (plus particuliĂšrement dans le domaine de la santĂ©)? (b) Quelles propriĂ©tĂ©s peut-on garantir formellement, suite Ă  cette construction? D’une part, le volume du schĂ©ma de donnĂ©es nĂ©cessite d’importantes ressources humaines et financiĂšres, et d’autre part, plusieurs modĂšles temporels existent, mais ils ne sont pas formalisĂ©s ou non gĂ©nĂ©raux. Les concepteurs s’en remettent donc le plus souvent Ă  des rĂšgles de pratiques variĂ©es, floues, incomplĂštes et non validĂ©es. Dans ce travail, un cadre de rĂ©fĂ©rence permettant de formaliser, de gĂ©nĂ©raliser et d’opĂ©rationnaliser des modĂšles temporels est dĂ©fini. Deux modĂšles : BCDM et TRM sont prĂ©sentĂ©s selon le cadre de rĂ©fĂ©rence avec leurs contraintes d’intĂ©gritĂ©, leurs algorithmes de construction et une liste des prolongements requis. Il en rĂ©sulte qu’il est dĂ©sormais possible de s’affranchir des rĂšgles de pratique imprĂ©cises et de temporaliser un entrepĂŽt en se fondant sur une mĂ©thode rigoureuse aux propriĂ©tĂ©s dĂ©montrables basĂ©es sur des critĂšres fondamentaux (thĂ©orie relationnelle), des critĂšres de conception reconnus et explicites (normalisation)
    corecore