4 research outputs found

    Disclosure Risk from Homogeneity Attack in Differentially Private Frequency Distribution

    Full text link
    Differential privacy (DP) provides a robust model to achieve privacy guarantees for released information. We examine the protection potency of sanitized multi-dimensional frequency distributions via DP randomization mechanisms against homogeneity attack (HA). HA allows adversaries to obtain the exact values on sensitive attributes for their targets without having to identify them from the released data. We propose measures for disclosure risk from HA and derive closed-form relationships between the privacy loss parameters in DP and the disclosure risk from HA. The availability of the closed-form relationships assists understanding the abstract concepts of DP and privacy loss parameters by putting them in the context of a concrete privacy attack and offers a perspective for choosing privacy loss parameters when employing DP mechanisms in information sanitization and release in practice. We apply the closed-form mathematical relationships in real-life datasets to demonstrate the assessment of disclosure risk due to HA on differentially private sanitized frequency distributions at various privacy loss parameters

    Evaluating the risk of disclosure and utility in a synthetic dataset

    Get PDF
    The advancement of information technology has improved the delivery of financial services by the introduction of Financial Technology (FinTech). To enhance their customer satisfaction, Fintech companies leverage artificial intelligence (AI) to collect fine-grained data about individuals, which enables them to provide more intelligent and customized services. However, although visions thereof promise to make our lives easier, they also raise major security and privacy concerns for their users. Differential privacy (DP) is a popular technique for protecting individual privacy and at the same time for releasing data for public use. However, very few research efforts have been devoted to maintaining a balance between the corresponding risk of data disclosure (RoD) and data utility. In this paper, we propose data-driven approaches to differentially release private data to evaluate the RoD. We develop algorithms to evaluate whether the differentially private synthetic dataset offers sufficient privacy. In addition to privacy, the utility of the synthetic dataset is an important metric for the differential release of private data. Thus, we propose a data-driven algorithm that uses curve fitting to measure and predict the error of the statistical result incurred by adding random noise to the original dataset. We also present an algorithm for choosing an appropriate privacy budget ϵ to maintain the balance between privacy and utility. Our comprehensive experimental analysis proves both the efficiency and estimation accuracy of the proposed algorithms
    corecore