Preserving Privacy in High-Dimensional Data Publishing

Abstract

We are witnessing a continuous expansion of information technology that never ceases to impress us with its computational power, storage capacity, and agile mobility. Such technology is becoming more pervasive by the day and has enhanced various aspects of our daily lives. GPS-equipped devices, smart card automated fare collection systems, and sensory technology are but a few examples of advanced, yet affordable, data-generating technologies that are an integral part of modern society. To enhance user experience or provide better services, service providers rely on collecting person-specific information from users. Thus, the collected data is studied and analyzed in order to extract useful information. It is a common practice for the collected data to be shared with a third-party, e.g., a data mining firm, for data analysis. However, the shared data must not leak sensitive information about the individuals to whom the data belongs or reveal their identity. In other words, individuals’ privacy must be protected in the published data. Privacy-preserving data publishing is a research area that studies anonymizing person-specific data without compromising its utility for future data analysis. This thesis studies and proposes anonymization solutions for three types of high-dimensional data: trajectory streams, static trajectories, and relational data. We demonstrate through theoretical and experimental analysis that our proposed solutions, for the most part, outperform state-of-the-art methods in terms of utility, efficiency, and scalability

    Similar works