419 research outputs found

    k-anonymous Microdata Release via Post Randomisation Method

    Full text link
    The problem of the release of anonymized microdata is an important topic in the fields of statistical disclosure control (SDC) and privacy preserving data publishing (PPDP), and yet it remains sufficiently unsolved. In these research fields, k-anonymity has been widely studied as an anonymity notion for mainly deterministic anonymization algorithms, and some probabilistic relaxations have been developed. However, they are not sufficient due to their limitations, i.e., being weaker than the original k-anonymity or requiring strong parametric assumptions. First we propose Pk-anonymity, a new probabilistic k-anonymity, and prove that Pk-anonymity is a mathematical extension of k-anonymity rather than a relaxation. Furthermore, Pk-anonymity requires no parametric assumptions. This property has a significant meaning in the viewpoint that it enables us to compare privacy levels of probabilistic microdata release algorithms with deterministic ones. Second, we apply Pk-anonymity to the post randomization method (PRAM), which is an SDC algorithm based on randomization. PRAM is proven to satisfy Pk-anonymity in a controlled way, i.e, one can control PRAM's parameter so that Pk-anonymity is satisfied. On the other hand, PRAM is also known to satisfy ε{\varepsilon}-differential privacy, a recent popular and strong privacy notion. This fact means that our results significantly enhance PRAM since it implies the satisfaction of both important notions: k-anonymity and ε{\varepsilon}-differential privacy.Comment: 22 pages, 4 figure

    Secured Data Masking Framework and Technique for Preserving Privacy in a Business Intelligence Analytics Platform

    Get PDF
    The main concept behind business intelligence (BI) is how to use integrated data across different business systems within an enterprise to make strategic decisions. It is difficult to map internal and external BI’s users to subsets of the enterprise’s data warehouse (DW), resulting that protecting the privacy of this data while maintaining its utility is a challenging task. Today, such DW systems constitute one of the most serious privacy breach threats that an enterprise might face when many internal users of different security levels have access to BI components. This thesis proposes a data masking framework (iMaskU: Identify, Map, Apply, Sign, Keep testing, Utilize) for a BI platform to protect the data at rest, preserve the data format, and maintain the data utility on-the-fly querying level. A new reversible data masking technique (COntent BAsed Data masking - COBAD) is developed as an implementation of iMaskU. The masking algorithm in COBAD is based on the statistical content of the extracted dataset, so that, the masked data cannot be linked with specific individuals or be re-identified by any means. The strength of the re-identification risk factor for the COBAD technique has been computed using a supercomputer where, three security scheme/attacking methods are considered, a) the brute force attack, needs, on average, 55 years to crack the key of each record; b) the dictionary attack, needs 231 days to crack the same key for the entire extracted dataset (containing 50,000 records), c) a data linkage attack, the re-identification risk is very low when the common linked attributes are used. The performance validation of COBAD masking technique has been conducted. A database schema of 1GB is used in TPC-H decision support benchmark. The performance evaluation for the execution time of the selected TPC-H queries presented that the COBAD speed results are much better than AES128 and 3DES encryption. Theoretical and experimental results show that the proposed solution provides a reasonable trade-off between data security and the utility of re-identified data

    Can Our Health Data Stay Private? A Review and Future Directions for IS Research on Privacy-Preserving AI in Healthcare

    Get PDF
    The generation of data has become one of the main drivers of modern healthcare. Like other industries, we see that the total amount of healthcare data is growing and in diversity. Thus, Artificial Intelligence (AI) is being used increasingly as a tool to turn this body of healthcare data into real value. But with AI and big data comes big risk, especially in terms of data privacy. Privacy-preserving AI techniques are gaining in popularity to prevent patient privacy compromises while utilizing the potentials offered by AI. However, there is no clear understanding of the current research space of applying such privacy-preserving techniques in healthcare. This paper aims to provide an understanding of these techniques and investigates the emerging research field of privacy-preserving AI and its use in healthcare by reviewing the current multidisciplinary research to synthesize knowledge and derive future research directions in this regard

    Enabling Multi-level Trust in Privacy Preserving Data Mining

    Full text link
    Privacy Preserving Data Mining (PPDM) addresses the problem of developing accurate models about aggregated data without access to precise information in individual data record. A widely studied \emph{perturbation-based PPDM} approach introduces random perturbation to individual values to preserve privacy before data is published. Previous solutions of this approach are limited in their tacit assumption of single-level trust on data miners. In this work, we relax this assumption and expand the scope of perturbation-based PPDM to Multi-Level Trust (MLT-PPDM). In our setting, the more trusted a data miner is, the less perturbed copy of the data it can access. Under this setting, a malicious data miner may have access to differently perturbed copies of the same data through various means, and may combine these diverse copies to jointly infer additional information about the original data that the data owner does not intend to release. Preventing such \emph{diversity attacks} is the key challenge of providing MLT-PPDM services. We address this challenge by properly correlating perturbation across copies at different trust levels. We prove that our solution is robust against diversity attacks with respect to our privacy goal. That is, for data miners who have access to an arbitrary collection of the perturbed copies, our solution prevent them from jointly reconstructing the original data more accurately than the best effort using any individual copy in the collection. Our solution allows a data owner to generate perturbed copies of its data for arbitrary trust levels on-demand. This feature offers data owners maximum flexibility.Comment: 20 pages, 5 figures. Accepted for publication in IEEE Transactions on Knowledge and Data Engineerin

    Cloud BI: A Multi-party Authentication Framework for Securing Business Intelligence on the Cloud

    Get PDF
    Business intelligence (BI) has emerged as a key technology to be hosted on Cloud computing. BI offers a method to analyse data thereby enabling informed decision making to improve business performance and profitability. However, within the shared domains of Cloud computing, BI is exposed to increased security and privacy threats because an unauthorised user may be able to gain access to highly sensitive, consolidated business information. The business process contains collaborating services and users from multiple Cloud systems in different security realms which need to be engaged dynamically at runtime. If the heterogamous Cloud systems located in different security realms do not have direct authentication relationships then it is technically difficult to enable a secure collaboration. In order to address these security challenges, a new authentication framework is required to establish certain trust relationships among these BI service instances and users by distributing a common session secret to all participants of a session. The author addresses this challenge by designing and implementing a multiparty authentication framework for dynamic secure interactions when members of different security realms want to access services. The framework takes advantage of the trust relationship between session members in different security realms to enable a user to obtain security credentials to access Cloud resources in a remote realm. This mechanism can help Cloud session users authenticate their session membership to improve the authentication processes within multi-party sessions. The correctness of the proposed framework has been verified by using BAN Logics. The performance and the overhead have been evaluated via simulation in a dynamic environment. A prototype authentication system has been designed, implemented and tested based on the proposed framework. The research concludes that the proposed framework and its supporting protocols are an effective functional basis for practical implementation testing, as it achieves good scalability and imposes only minimal performance overhead which is comparable with other state-of-art methods

    New Fundamental Technologies in Data Mining

    Get PDF
    The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining

    Marginal Release Under Local Differential Privacy

    Full text link
    Many analysis and machine learning tasks require the availability of marginal statistics on multidimensional datasets while providing strong privacy guarantees for the data subjects. Applications for these statistics range from finding correlations in the data to fitting sophisticated prediction models. In this paper, we provide a set of algorithms for materializing marginal statistics under the strong model of local differential privacy. We prove the first tight theoretical bounds on the accuracy of marginals compiled under each approach, perform empirical evaluation to confirm these bounds, and evaluate them for tasks such as modeling and correlation testing. Our results show that releasing information based on (local) Fourier transformations of the input is preferable to alternatives based directly on (local) marginals
    • …
    corecore