14,829 research outputs found

    Privacy Tradeoffs in Predictive Analytics

    Full text link
    Online services routinely mine user data to predict user preferences, make recommendations, and place targeted ads. Recent research has demonstrated that several private user attributes (such as political affiliation, sexual orientation, and gender) can be inferred from such data. Can a privacy-conscious user benefit from personalization while simultaneously protecting her private attributes? We study this question in the context of a rating prediction service based on matrix factorization. We construct a protocol of interactions between the service and users that has remarkable optimality properties: it is privacy-preserving, in that no inference algorithm can succeed in inferring a user's private attribute with a probability better than random guessing; it has maximal accuracy, in that no other privacy-preserving protocol improves rating prediction; and, finally, it involves a minimal disclosure, as the prediction accuracy strictly decreases when the service reveals less information. We extensively evaluate our protocol using several rating datasets, demonstrating that it successfully blocks the inference of gender, age and political affiliation, while incurring less than 5% decrease in the accuracy of rating prediction.Comment: Extended version of the paper appearing in SIGMETRICS 201

    Methodological criteria for the assessment of moderators in systematic reviews of randomised controlled trials : a consensus study

    Get PDF
    Background: Current methodological guidelines provide advice about the assessment of sub-group analysis within RCTs, but do not specify explicit criteria for assessment. Our objective was to provide researchers with a set of criteria that will facilitate the grading of evidence for moderators, in systematic reviews. Method: We developed a set of criteria from methodological manuscripts (n = 18) using snowballing technique, and electronic database searches. Criteria were reviewed by an international Delphi panel (n = 21), comprising authors who have published methodological papers in this area, and researchers who have been active in the study of sub-group analysis in RCTs. We used the Research ANd Development/University of California Los Angeles appropriateness method to assess consensus on the quantitative data. Free responses were coded for consensus and disagreement. In a subsequent round additional criteria were extracted from the Cochrane Reviewers’ Handbook, and the process was repeated. Results: The recommendations are that meta-analysts report both confirmatory and exploratory findings for subgroups analysis. Confirmatory findings must only come from studies in which a specific theory/evidence based apriori statement is made. Exploratory findings may be used to inform future/subsequent trials. However, for inclusion in the meta-analysis of moderators, the following additional criteria should be applied to each study: Baseline factors should be measured prior to randomisation, measurement of baseline factors should be of adequate reliability and validity, and a specific test of the interaction between baseline factors and interventions must be presented. Conclusions: There is consensus from a group of 21 international experts that methodological criteria to assess moderators within systematic reviews of RCTs is both timely and necessary. The consensus from the experts resulted in five criteria divided into two groups when synthesising evidence: confirmatory findings to support hypotheses about moderators and exploratory findings to inform future research. These recommendations are discussed in reference to previous recommendations for evaluating and reporting moderator studies

    Multiple imputation for sharing precise geographies in public use data

    Full text link
    When releasing data to the public, data stewards are ethically and often legally obligated to protect the confidentiality of data subjects' identities and sensitive attributes. They also strive to release data that are informative for a wide range of secondary analyses. Achieving both objectives is particularly challenging when data stewards seek to release highly resolved geographical information. We present an approach for protecting the confidentiality of data with geographic identifiers based on multiple imputation. The basic idea is to convert geography to latitude and longitude, estimate a bivariate response model conditional on attributes, and simulate new latitude and longitude values from these models. We illustrate the proposed methods using data describing causes of death in Durham, North Carolina. In the context of the application, we present a straightforward tool for generating simulated geographies and attributes based on regression trees, and we present methods for assessing disclosure risks with such simulated data.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS506 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Data DNA: The Next Generation of Statistical Metadata

    Get PDF
    Describes the components of a complete statistical metadata system and suggests ways to create and structure metadata for better access and understanding of data sets by diverse users
    • …
    corecore