4 research outputs found
Disclosure Risk from Homogeneity Attack in Differentially Private Frequency Distribution
Differential privacy (DP) provides a robust model to achieve privacy
guarantees for released information. We examine the protection potency of
sanitized multi-dimensional frequency distributions via DP randomization
mechanisms against homogeneity attack (HA). HA allows adversaries to obtain the
exact values on sensitive attributes for their targets without having to
identify them from the released data. We propose measures for disclosure risk
from HA and derive closed-form relationships between the privacy loss
parameters in DP and the disclosure risk from HA. The availability of the
closed-form relationships assists understanding the abstract concepts of DP and
privacy loss parameters by putting them in the context of a concrete privacy
attack and offers a perspective for choosing privacy loss parameters when
employing DP mechanisms in information sanitization and release in practice. We
apply the closed-form mathematical relationships in real-life datasets to
demonstrate the assessment of disclosure risk due to HA on differentially
private sanitized frequency distributions at various privacy loss parameters
Evaluating the risk of disclosure and utility in a synthetic dataset
The advancement of information technology has improved the delivery
of financial services by the introduction of Financial Technology (FinTech). To
enhance their customer satisfaction, Fintech companies leverage artificial
intelligence (AI) to collect fine-grained data about individuals, which enables them
to provide more intelligent and customized services. However, although visions
thereof promise to make our lives easier, they also raise major security and privacy
concerns for their users. Differential privacy (DP) is a popular technique for
protecting individual privacy and at the same time for releasing data for public
use. However, very few research efforts have been devoted to maintaining a
balance between the corresponding risk of data disclosure (RoD) and data utility.
In this paper, we propose data-driven approaches to differentially release private
data to evaluate the RoD. We develop algorithms to evaluate whether the
differentially private synthetic dataset offers sufficient privacy. In addition to
privacy, the utility of the synthetic dataset is an important metric for the
differential release of private data. Thus, we propose a data-driven algorithm that
uses curve fitting to measure and predict the error of the statistical result incurred
by adding random noise to the original dataset. We also present an algorithm for
choosing an appropriate privacy budget ϵ to maintain the balance between privacy
and utility. Our comprehensive experimental analysis proves both the efficiency
and estimation accuracy of the proposed algorithms