Synthesizing Accurate Relational Data under Differential Privacy

Abstract

Medical data is sensitive personal data which, according to GDPR and HIPAA, necessitates regulations concerning their use. Anonymizing this data prior to research would allow for broader access, due to a lower sensitivity. Privacy-aware data synthesis has been proposed as a solution. However, current algorithms face difficulties in synthesizing medical data while maintaining privacy and utility. This is due to the structure of medical data which consists of multiple interlinked tables with high dimensional columns containing sequential aspects of the patient trajectory. The resulting number of correlations is intractable to model naively and, if relational correlations are not accounted for, the resulting data has poor utility (e.g., leads to invalid patient trajectories). In this paper, we present MARE, a relational synthesis algorithm which focuses on a set of core correlations found in relational data while pruning others. The resulting lower computational complexity allows MARE to produce accurate relational data. We showcase that MARE can synthesize multiple medical datasets, which contain sequential aspects, while maintaining utility in form of inter-table and inter-row correlations and privacy guarantees

Similar works

Full text

thumbnail-image

VBN (Videnbasen) Aalborg Universitets forskningsportal

redirect
Last time updated on 05/03/2025

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.