18,524 research outputs found

    Publishing Microdata with a Robust Privacy Guarantee

    Full text link
    Today, the publication of microdata poses a privacy threat. Vast research has striven to define the privacy condition that microdata should satisfy before it is released, and devise algorithms to anonymize the data so as to achieve this condition. Yet, no method proposed to date explicitly bounds the percentage of information an adversary gains after seeing the published data for each sensitive value therein. This paper introduces beta-likeness, an appropriately robust privacy model for microdata anonymization, along with two anonymization schemes designed therefor, the one based on generalization, and the other based on perturbation. Our model postulates that an adversary's confidence on the likelihood of a certain sensitive-attribute (SA) value should not increase, in relative difference terms, by more than a predefined threshold. Our techniques aim to satisfy a given beta threshold with little information loss. We experimentally demonstrate that (i) our model provides an effective privacy guarantee in a way that predecessor models cannot, (ii) our generalization scheme is more effective and efficient in its task than methods adapting algorithms for the k-anonymity model, and (iii) our perturbation method outperforms a baseline approach. Moreover, we discuss in detail the resistance of our model and methods to attacks proposed in previous research.Comment: VLDB201

    Towards trajectory anonymization: a generalization-based approach

    Get PDF
    Trajectory datasets are becoming popular due to the massive usage of GPS and locationbased services. In this paper, we address privacy issues regarding the identification of individuals in static trajectory datasets. We first adopt the notion of k-anonymity to trajectories and propose a novel generalization-based approach for anonymization of trajectories. We further show that releasing anonymized trajectories may still have some privacy leaks. Therefore we propose a randomization based reconstruction algorithm for releasing anonymized trajectory data and also present how the underlying techniques can be adapted to other anonymity standards. The experimental results on real and synthetic trajectory datasets show the effectiveness of the proposed techniques

    Citizens and Institutions as Information Prosumers. The Case Study of Italian Municipalities on Twitter

    Get PDF
    The aim of this paper is to address changes in public communication following the advent of Internet social networking tools and the emerging web 2.0 technologies which are providing new ways of sharing information and knowledge. In particular public administrations are called upon to reinvent the governance of public affairs and to update the means for interacting with their communities. The paper develops an analysis of the distribution, diffusion and performance of the official profiles on Twitter adopted by the Italian municipalities (comuni) up to November 2013. It aims to identify the patterns of spatial distribution and the drivers of the diffusion of Twitter profiles; the performance of the profiles through an aggregated index, called the Twitter performance index (Twiperindex), which evaluates the profiles' activity with reference to the gravitational areas of the municipalities in order to enable comparisons of the activity of municipalities with different demographic sizes and functional roles. The results show that only a small portion of innovative municipalities have adopted Twitter to enhance e-participation and e-governance and that the drivers of the diffusion seem to be related either to past experiences and existing conditions (i.e. civic networks, digital infrastructures) developed over time or to strong local community awareness. The better performances are achieved mainly by small and medium-sized municipalities. Of course, the phenomenon is very new and fluid, therefore this analysis should be considered as a first step in ongoing research which aims to grasp the dynamics of these new means of public communication

    Microdata protection through approximate microaggregation

    Get PDF
    Microdata protection is a hot topic in the field of Statistical Disclosure Control, which has gained special interest after the disclosure of 658000 queries by the America Online (AOL) search engine in August 2006. Many algorithms, methods and properties have been proposed to deal with microdata disclosure. One of the emerging concepts in microdata protection is k-anonymity, introduced by Samarati and Sweeney. k-anonymity provides a simple and efficient approach to protect private individual information and is gaining increasing popularity. k-anonymity requires that every record in the microdata table released be indistinguishably related to no fewer than k respondents. In this paper, we apply the concept of entropy to propose a distance metric to evaluate the amount of mutual information among records in microdata, and propose a method of constructing dependency tree to find the key attributes, which we then use to process approximate microaggregation. Further, we adopt this new microaggregation technique to study kk-anonymity problem, and an efficient algorithm is developed. Experimental results show that the proposed microaggregation technique is efficient and effective in the terms of running time and information loss

    GLOVE: towards privacy-preserving publishing of record-level-truthful mobile phone trajectories

    Get PDF
    Datasets of mobile phone trajectories collected by network operators offer an unprecedented opportunity to discover new knowledge from the activity of large populations of millions. However, publishing such trajectories also raises significant privacy concerns, as they contain personal data in the form of individual movement patterns. Privacy risks induce network operators to enforce restrictive confidential agreements in the rare occasions when they grant access to collected trajectories, whereas a less involved circulation of these data would fuel research and enable reproducibility in many disciplines. In this work, we contribute a building block toward the design of privacy-preserving datasets of mobile phone trajectories that are truthful at the record level. We present GLOVE, an algorithm that implements k-anonymity, hence solving the crucial unicity problem that affects this type of data while ensuring that the anonymized trajectories correspond to real-life users. GLOVE builds on original insights about the root causes behind the undesirable unicity of mobile phone trajectories, and leverages generalization and suppression to remove them. Proof-of-concept validations with large-scale real-world datasets demonstrate that the approach adopted by GLOVE allows preserving a substantial level of accuracy in the data, higher than that granted by previous methodologies.This work was supported by the Atracción de Talento Investigador program of the Comunidad de Madrid under Grant No. 2019-T1/TIC-16037 NetSense

    p-probabilistic k-anonymous microaggregation for the anonymization of surveys with uncertain participation

    Get PDF
    We develop a probabilistic variant of k-anonymous microaggregation which we term p-probabilistic resorting to a statistical model of respondent participation in order to aggregate quasi-identifiers in such a manner that k-anonymity is concordantly enforced with a parametric probabilistic guarantee. Succinctly owing the possibility that some respondents may not finally participate, sufficiently larger cells are created striving to satisfy k-anonymity with probability at least p. The microaggregation function is designed before the respondents submit their confidential data. More precisely, a specification of the function is sent to them which they may verify and apply to their quasi-identifying demographic variables prior to submitting the microaggregated data along with the confidential attributes to an authorized repository. We propose a number of metrics to assess the performance of our probabilistic approach in terms of anonymity and distortion which we proceed to investigate theoretically in depth and empirically with synthetic and standardized data. We stress that in addition to constituting a functional extension of traditional microaggregation, thereby broadening its applicability to the anonymization of statistical databases in a wide variety of contexts, the relaxation of trust assumptions is arguably expected to have a considerable impact on user acceptance and ultimately on data utility through mere availability.Peer ReviewedPostprint (author's final draft

    Creation of public use files: lessons learned from the comparative effectiveness research public use files data pilot project

    Get PDF
    In this paper we describe lessons learned from the creation of Basic Stand Alone (BSA) Public Use Files (PUFs) for the Comparative Effectiveness Research Public Use Files Data Pilot Project (CER-PUF). CER-PUF is aimed at increasing access to the Centers for Medicare and Medicaid Services (CMS) Medicare claims datasets through PUFs that: do not require user fees and data use agreements, have been de-identified to assure the confidentiality of the beneficiaries and providers, and still provide substantial analytic utility to researchers. For this paper we define PUFs as datasets characterized by free and unrestricted access to any user. We derive lessons learned from five major project activities: (i) a review of the statistical and computer science literature on best practices in PUF creation, (ii) interviews with comparative effectiveness researchers to assess their data needs, (iii) case studies of PUF initiatives in the United States, (iv) interviews with stakeholders to identify the most salient issues regarding making microdata publicly available, and (v) the actual process of creating the Medicare claims data BSA PUFs

    Measuring internet activity: a (selective) review of methods and metrics

    Get PDF
    Two Decades after the birth of the World Wide Web, more than two billion people around the world are Internet users. The digital landscape is littered with hints that the affordances of digital communications are being leveraged to transform life in profound and important ways. The reach and influence of digitally mediated activity grow by the day and touch upon all aspects of life, from health, education, and commerce to religion and governance. This trend demands that we seek answers to the biggest questions about how digitally mediated communication changes society and the role of different policies in helping or hindering the beneficial aspects of these changes. Yet despite the profusion of data the digital age has brought upon us—we now have access to a flood of information about the movements, relationships, purchasing decisions, interests, and intimate thoughts of people around the world—the distance between the great questions of the digital age and our understanding of the impact of digital communications on society remains large. A number of ongoing policy questions have emerged that beg for better empirical data and analyses upon which to base wider and more insightful perspectives on the mechanics of social, economic, and political life online. This paper seeks to describe the conceptual and practical impediments to measuring and understanding digital activity and highlights a sample of the many efforts to fill the gap between our incomplete understanding of digital life and the formidable policy questions related to developing a vibrant and healthy Internet that serves the public interest and contributes to human wellbeing. Our primary focus is on efforts to measure Internet activity, as we believe obtaining robust, accurate data is a necessary and valuable first step that will lead us closer to answering the vitally important questions of the digital realm. Even this step is challenging: the Internet is difficult to measure and monitor, and there is no simple aggregate measure of Internet activity—no GDP, no HDI. In the following section we present a framework for assessing efforts to document digital activity. The next three sections offer a summary and description of many of the ongoing projects that document digital activity, with two final sections devoted to discussion and conclusions

    PerfWeb: How to Violate Web Privacy with Hardware Performance Events

    Full text link
    The browser history reveals highly sensitive information about users, such as financial status, health conditions, or political views. Private browsing modes and anonymity networks are consequently important tools to preserve the privacy not only of regular users but in particular of whistleblowers and dissidents. Yet, in this work we show how a malicious application can infer opened websites from Google Chrome in Incognito mode and from Tor Browser by exploiting hardware performance events (HPEs). In particular, we analyze the browsers' microarchitectural footprint with the help of advanced Machine Learning techniques: k-th Nearest Neighbors, Decision Trees, Support Vector Machines, and in contrast to previous literature also Convolutional Neural Networks. We profile 40 different websites, 30 of the top Alexa sites and 10 whistleblowing portals, on two machines featuring an Intel and an ARM processor. By monitoring retired instructions, cache accesses, and bus cycles for at most 5 seconds, we manage to classify the selected websites with a success rate of up to 86.3%. The results show that hardware performance events can clearly undermine the privacy of web users. We therefore propose mitigation strategies that impede our attacks and still allow legitimate use of HPEs

    Using mixed methods to track the growth of the Web: tracing open government data initiatives

    No full text
    In recent years, there have been a rising number of Open Government Data (OGD) initiatives; a political, social and technical movement armed with a common goal of publishing government data in open, re-usable formats in order to improve citizen-to-government transparency, efficiency, and democracy. As a sign of commitment, the Open Government Partnership was formed, comprising of a collection of countries striving to achieve OGD. Since its initial launch, the number of countries committed to adopting an Open Government Data agenda has grown to more than 50; including countries from South America to the Far East.Current approaches to understanding Web initiatives such as OGD are still being developed. Methodologies grounded in multidisciplinarity are still yet to be achieved; typically research follows a social or technological approach underpinned by quantitative or qualitative methods, and rarely combining the two into a single analytical framework. In this paper, a mixed methods approach will be introduced, which uses qualitative data underpinned by sociological theory to complement a quantitative analysis using computer science techniques. This method aims to provide an alternative approach to understanding the socio-technical activities of the Web. To demonstrate this, the activities of the UK Open Government Data initiative will be explored using a range of quantitative and qualitative data, examining the activities of the community, to provide a rich analysis of the formation and development of the UK OGD community
    corecore