11,851 research outputs found
A Simple Algorithm for Estimating Distribution Parameters from -Dimensional Randomized Binary Responses
Randomized response is attractive for privacy preserving data collection
because the provided privacy can be quantified by means such as differential
privacy. However, recovering and analyzing statistics involving multiple
dependent randomized binary attributes can be difficult, posing a significant
barrier to use. In this work, we address this problem by identifying and
analyzing a family of response randomizers that change each binary attribute
independently with the same probability. Modes of Google's Rappor randomizer as
well as applications of two well-known classical randomized response methods,
Warner's original method and Simmons' unrelated question method, belong to this
family. We show that randomizers in this family transform multinomial
distribution parameters by an iterated Kronecker product of an invertible and
bisymmetric matrix. This allows us to present a simple and
efficient algorithm for obtaining unbiased maximum likelihood parameter
estimates for -way marginals from randomized responses and provide
theoretical bounds on the statistical efficiency achieved. We also describe the
efficiency - differential privacy tradeoff. Importantly, both randomization of
responses and the estimation algorithm are simple to implement, an aspect
critical to technologies for privacy protection and security.Comment: Accepted at Information Security - 21th International Conference, ISC
2018. Adapted to meet article length requirements. Fixed typo. Results
unchange
A Practically Competitive and Provably Consistent Algorithm for Uplift Modeling
Randomized experiments have been critical tools of decision making for
decades. However, subjects can show significant heterogeneity in response to
treatments in many important applications. Therefore it is not enough to simply
know which treatment is optimal for the entire population. What we need is a
model that correctly customize treatment assignment base on subject
characteristics. The problem of constructing such models from randomized
experiments data is known as Uplift Modeling in the literature. Many algorithms
have been proposed for uplift modeling and some have generated promising
results on various data sets. Yet little is known about the theoretical
properties of these algorithms. In this paper, we propose a new tree-based
ensemble algorithm for uplift modeling. Experiments show that our algorithm can
achieve competitive results on both synthetic and industry-provided data. In
addition, by properly tuning the "node size" parameter, our algorithm is proved
to be consistent under mild regularity conditions. This is the first consistent
algorithm for uplift modeling that we are aware of.Comment: Accepted by 2017 IEEE International Conference on Data Minin
Interpretable Subgroup Discovery in Treatment Effect Estimation with Application to Opioid Prescribing Guidelines
The dearth of prescribing guidelines for physicians is one key driver of the
current opioid epidemic in the United States. In this work, we analyze medical
and pharmaceutical claims data to draw insights on characteristics of patients
who are more prone to adverse outcomes after an initial synthetic opioid
prescription. Toward this end, we propose a generative model that allows
discovery from observational data of subgroups that demonstrate an enhanced or
diminished causal effect due to treatment. Our approach models these
sub-populations as a mixture distribution, using sparsity to enhance
interpretability, while jointly learning nonlinear predictors of the potential
outcomes to better adjust for confounding. The approach leads to
human-interpretable insights on discovered subgroups, improving the practical
utility for decision suppor
A general class of zero-or-one inflated beta regression models
This paper proposes a general class of regression models for continuous
proportions when the data contain zeros or ones. The proposed class of models
assumes that the response variable has a mixed continuous-discrete distribution
with probability mass at zero or one. The beta distribution is used to describe
the continuous component of the model, since its density has a wide range of
different shapes depending on the values of the two parameters that index the
distribution. We use a suitable parameterization of the beta law in terms of
its mean and a precision parameter. The parameters of the mixture distribution
are modeled as functions of regression parameters. We provide inference,
diagnostic, and model selection tools for this class of models. A practical
application that employs real data is presented.Comment: 21 pages, 3 figures, 5 tables. Computational Statistics and Data
Analysis, 17 October 2011, ISSN 0167-9473
(http://www.sciencedirect.com/science/article/pii/S0167947311003628
- …