Defining Privacy and Utility in Data Sets

Abstract

Is it possible to release useful data while preserving the privacy of the individuals whose information is in the database? This question has been the subject of considerable controversy, particularly in the wake of well-publicized instances in which researchers showed how to re-identify individuals in supposedly anonymous data. Some have argued that privacy and utility are fundamentally incompatible, while others have suggested that simple steps can be taken to achieve both simultaneously. Both sides have looked to the computer science literature for support. What the existing debate has overlooked, however, is that the relationship between privacy and utility depends crucially on what one means by privacy and what one means by utility. Apparently contradictory results in the computer science literature can be explained by the use of different definitions to formalize these concepts. Without sufficient attention to these definitional issues, it is all too easy to overgeneralize the technical results. More importantly, there are nuances to how definitions of privacy and utility can differ from each other, nuances that matter for why a definition that is appropriate in one context may not be appropriate in another. Analyzing these nuances exposes the policy choices inherent in the choice of one definition over another and thereby elucidates decisions about whether and how to regulate data privacy across varying social context

    Similar works