Search CORE

11,851 research outputs found

A Simple Algorithm for Estimating Distribution Parameters from $n$ -Dimensional Randomized Binary Responses

Author: Vinterbo Staal A.
Publication venue
Publication date: 01/01/2018
Field of study

Randomized response is attractive for privacy preserving data collection because the provided privacy can be quantified by means such as differential privacy. However, recovering and analyzing statistics involving multiple dependent randomized binary attributes can be difficult, posing a significant barrier to use. In this work, we address this problem by identifying and analyzing a family of response randomizers that change each binary attribute independently with the same probability. Modes of Google's Rappor randomizer as well as applications of two well-known classical randomized response methods, Warner's original method and Simmons' unrelated question method, belong to this family. We show that randomizers in this family transform multinomial distribution parameters by an iterated Kronecker product of an invertible and bisymmetric

2 \times 2

matrix. This allows us to present a simple and efficient algorithm for obtaining unbiased maximum likelihood parameter estimates for

k

-way marginals from randomized responses and provide theoretical bounds on the statistical efficiency achieved. We also describe the efficiency - differential privacy tradeoff. Importantly, both randomization of responses and the estimation algorithm are simple to implement, an aspect critical to technologies for privacy protection and security.Comment: Accepted at Information Security - 21th International Conference, ISC 2018. Adapted to meet article length requirements. Fixed typo. Results unchange

arXiv.org e-Print Archive

NORA - Norwegian Open Research Archives

A Practically Competitive and Provably Consistent Algorithm for Uplift Modeling

Author: Fang Xiao
Simchi-Levi David
Zhao Yan
Publication venue
Publication date: 11/09/2017
Field of study

Randomized experiments have been critical tools of decision making for decades. However, subjects can show significant heterogeneity in response to treatments in many important applications. Therefore it is not enough to simply know which treatment is optimal for the entire population. What we need is a model that correctly customize treatment assignment base on subject characteristics. The problem of constructing such models from randomized experiments data is known as Uplift Modeling in the literature. Many algorithms have been proposed for uplift modeling and some have generated promising results on various data sets. Yet little is known about the theoretical properties of these algorithms. In this paper, we propose a new tree-based ensemble algorithm for uplift modeling. Experiments show that our algorithm can achieve competitive results on both synthetic and industry-provided data. In addition, by properly tuning the "node size" parameter, our algorithm is proved to be consistent under mild regularity conditions. This is the first consistent algorithm for uplift modeling that we are aware of.Comment: Accepted by 2017 IEEE International Conference on Data Minin

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Copula-based models for multivariate discrete response data

Author: Nikoloulopoulos Aristidis K
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

University of East Anglia digital repository

Interpretable Subgroup Discovery in Treatment Effect Estimation with Application to Opioid Prescribing Guidelines

Author: Che Zhengping
Crofford Leslie J.
Davis Mellar P.
Dusseldorp Elise
Gebhart G. F.
Harbaugh Calista M.
Johansson Fredrik
Kingma Diederik P
Marlin Benjamin M.
Parente Stephen T.
Patil Pravinkumar R.
Rubin Donald B.
Shalit Uri
Su Xiaogang
Zhang Jinghe
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/03/2020
Field of study

The dearth of prescribing guidelines for physicians is one key driver of the current opioid epidemic in the United States. In this work, we analyze medical and pharmaceutical claims data to draw insights on characteristics of patients who are more prone to adverse outcomes after an initial synthetic opioid prescription. Toward this end, we propose a generative model that allows discovery from observational data of subgroups that demonstrate an enhanced or diminished causal effect due to treatment. Our approach models these sub-populations as a mixture distribution, using sparsity to enhance interpretability, while jointly learning nonlinear predictors of the potential outcomes to better adjust for confounding. The approach leads to human-interpretable insights on discovered subgroups, improving the practical utility for decision suppor

arXiv.org e-Print Archive

Crossref

A general class of zero-or-one inflated beta regression models

Author: Akaike
Atkinson
Cook
Cook
Cox
Cox
Cox
Cox
Dunn
Espinheira
Espinheira
Fahrmeir
Ferrari
Ferrari
Hoff
Ihaka
Johnson
Kieschnick
Korhonen
McCullagh
McFadden
Moolgavkar
Ospina
Pace
Paolino
Press
Ramalho
Ramsey
Rao
Raydonal Ospina
Rigby
Schwarz
Silvia L.P. Ferrari
Simas
Smithson
Stasinopoulos
Venables
Wei
Yoo
Publication venue: 'Elsevier BV'
Publication date: 02/11/2011
Field of study

This paper proposes a general class of regression models for continuous proportions when the data contain zeros or ones. The proposed class of models assumes that the response variable has a mixed continuous-discrete distribution with probability mass at zero or one. The beta distribution is used to describe the continuous component of the model, since its density has a wide range of different shapes depending on the values of the two parameters that index the distribution. We use a suitable parameterization of the beta law in terms of its mean and a precision parameter. The parameters of the mixture distribution are modeled as functions of regression parameters. We provide inference, diagnostic, and model selection tools for this class of models. A practical application that employs real data is presented.Comment: 21 pages, 3 figures, 5 tables. Computational Statistics and Data Analysis, 17 October 2011, ISSN 0167-9473 (http://www.sciencedirect.com/science/article/pii/S0167947311003628

arXiv.org e-Print Archive

CiteSeerX

Crossref

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Universidade de São Paulo