14,829 research outputs found
Privacy Tradeoffs in Predictive Analytics
Online services routinely mine user data to predict user preferences, make
recommendations, and place targeted ads. Recent research has demonstrated that
several private user attributes (such as political affiliation, sexual
orientation, and gender) can be inferred from such data. Can a
privacy-conscious user benefit from personalization while simultaneously
protecting her private attributes? We study this question in the context of a
rating prediction service based on matrix factorization. We construct a
protocol of interactions between the service and users that has remarkable
optimality properties: it is privacy-preserving, in that no inference algorithm
can succeed in inferring a user's private attribute with a probability better
than random guessing; it has maximal accuracy, in that no other
privacy-preserving protocol improves rating prediction; and, finally, it
involves a minimal disclosure, as the prediction accuracy strictly decreases
when the service reveals less information. We extensively evaluate our protocol
using several rating datasets, demonstrating that it successfully blocks the
inference of gender, age and political affiliation, while incurring less than
5% decrease in the accuracy of rating prediction.Comment: Extended version of the paper appearing in SIGMETRICS 201
Methodological criteria for the assessment of moderators in systematic reviews of randomised controlled trials : a consensus study
Background: Current methodological guidelines provide advice about the assessment of sub-group analysis within
RCTs, but do not specify explicit criteria for assessment. Our objective was to provide researchers with a set of
criteria that will facilitate the grading of evidence for moderators, in systematic reviews.
Method: We developed a set of criteria from methodological manuscripts (n = 18) using snowballing technique,
and electronic database searches. Criteria were reviewed by an international Delphi panel (n = 21), comprising
authors who have published methodological papers in this area, and researchers who have been active in the
study of sub-group analysis in RCTs. We used the Research ANd Development/University of California Los Angeles
appropriateness method to assess consensus on the quantitative data. Free responses were coded for consensus
and disagreement. In a subsequent round additional criteria were extracted from the Cochrane Reviewers’
Handbook, and the process was repeated.
Results: The recommendations are that meta-analysts report both confirmatory and exploratory findings for subgroups
analysis. Confirmatory findings must only come from studies in which a specific theory/evidence based apriori
statement is made. Exploratory findings may be used to inform future/subsequent trials. However, for
inclusion in the meta-analysis of moderators, the following additional criteria should be applied to each study:
Baseline factors should be measured prior to randomisation, measurement of baseline factors should be of
adequate reliability and validity, and a specific test of the interaction between baseline factors and interventions
must be presented.
Conclusions: There is consensus from a group of 21 international experts that methodological criteria to assess
moderators within systematic reviews of RCTs is both timely and necessary. The consensus from the experts
resulted in five criteria divided into two groups when synthesising evidence: confirmatory findings to support
hypotheses about moderators and exploratory findings to inform future research. These recommendations are
discussed in reference to previous recommendations for evaluating and reporting moderator studies
Multiple imputation for sharing precise geographies in public use data
When releasing data to the public, data stewards are ethically and often
legally obligated to protect the confidentiality of data subjects' identities
and sensitive attributes. They also strive to release data that are informative
for a wide range of secondary analyses. Achieving both objectives is
particularly challenging when data stewards seek to release highly resolved
geographical information. We present an approach for protecting the
confidentiality of data with geographic identifiers based on multiple
imputation. The basic idea is to convert geography to latitude and longitude,
estimate a bivariate response model conditional on attributes, and simulate new
latitude and longitude values from these models. We illustrate the proposed
methods using data describing causes of death in Durham, North Carolina. In the
context of the application, we present a straightforward tool for generating
simulated geographies and attributes based on regression trees, and we present
methods for assessing disclosure risks with such simulated data.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS506 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Data DNA: The Next Generation of Statistical Metadata
Describes the components of a complete statistical metadata system and suggests ways to create and structure metadata for better access and understanding of data sets by diverse users
- …