Search CORE

2,598 research outputs found

Assessing the disclosure protection provided by misclassification for survey microdata

Author: Shlomo Natalie
Skinner Chris
Publication venue: Southampton Statistical Sciences Reseach Institute
Publication date: 07/08/2009
Field of study

Government statistical agencies often apply statistical disclosure limitation techniques to survey microdata to protect confidentiality. There is a need for ways to assess the protection provided. This paper develops some simple methods for disclosure limitation techniques which perturb the values of categorical identifying variables. The methods are applied in numerical experiments based upon census data from the United Kingdom which are subject to two perturbation techniques: data swapping and the post randomisation method. Some simplifying approximations to the measure of risk are found to work well in capturing the impacts of these techniques. These approximations provide simple extensions of existing risk assessment methods based upon Poisson log-linear models. A numerical experiment is also undertaken to assess the impact of multivariate misclassification with an increasing number of identifying variables. The methods developed in this paper may also be used to obtain more realistic assessments of risk which take account of the kinds of measurement and other non-sampling errors commonly arising in surveys

Southampton (e-Prints Soton)

Sampling and Inference for Beta Neutral-to-the-Left Models of Sparse Networks

Author: Bloem-Reddy Benjamin
Foster Adam
Mathieu Emile
Teh Yee Whye
Publication venue
Publication date: 01/01/2018
Field of study

Empirical evidence suggests that heavy-tailed degree distributions occurring in many real networks are well-approximated by power laws with exponents

\eta

that may take values either less than and greater than two. Models based on various forms of exchangeability are able to capture power laws with

\eta < 2

, and admit tractable inference algorithms; we draw on previous results to show that

\eta > 2

cannot be generated by the forms of exchangeability used in existing random graph models. Preferential attachment models generate power law exponents greater than two, but have been of limited use as statistical models due to the inherent difficulty of performing inference in non-exchangeable models. Motivated by this gap, we design and implement inference algorithms for a recently proposed class of models that generates

\eta

of all possible values. We show that although they are not exchangeable, these models have probabilistic structure amenable to inference. Our methods make a large class of previously intractable models useful for statistical inference.Comment: Accepted for publication in the proceedings of Conference on Uncertainty in Artificial Intelligence (UAI) 201

arXiv.org e-Print Archive

Oxford University Research Archive

Rule-based Machine Learning Methods for Functional Prediction

Author: Indurkhya N.
Weiss S. M.
Publication venue
Publication date: 01/01/1995
Field of study

We describe a machine learning method for predicting the value of a real-valued function, given the values of multiple input variables. The method induces solutions from samples in the form of ordered disjunctive normal form (DNF) decision rules. A central objective of the method and representation is the induction of compact, easily interpretable solutions. This rule-based decision model can be extended to search efficiently for similar cases prior to approximating function values. Experimental results on real-world data demonstrate that the new techniques are competitive with existing machine learning and statistical methods and can sometimes yield superior regression performance.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

Applying Deep Learning To Airbnb Search

Author: Abdool Mustafa
Barrow-Williams Nick
Collins Brendan M.
Duan Huizhong
Haldar Malay
Legrand Thomas
Ramanathan Prashant
Turnbull Bradley C.
Xu Tao
Yang Shulin
Zhang Qing
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/10/2018
Field of study

The application to search ranking is one of the biggest machine learning success stories at Airbnb. Much of the initial gains were driven by a gradient boosted decision tree model. The gains, however, plateaued over time. This paper discusses the work done in applying neural networks in an attempt to break out of that plateau. We present our perspective not with the intention of pushing the frontier of new modeling techniques. Instead, ours is a story of the elements we found useful in applying neural networks to a real life product. Deep learning was steep learning for us. To other teams embarking on similar journeys, we hope an account of our struggles and triumphs will provide some useful pointers. Bon voyage!Comment: 8 page

arXiv.org e-Print Archive

Crossref

The conditional permutation test for independence while controlling for confounders

Author: Athey
Barber
Belloni
Candès
Cover
Dawid
Doran
Ernst
Fukumizu
Gretton
Hennessy
Kojadinovic
Pfister
Rosenbaum
Runge
Sen
Song
Stigler
Strobl
Su
Su
Su
Székely
Székely
Veraverbeke
Weihs
Zhang
Publication venue
Publication date: 07/05/2019
Field of study

We propose a general new method, the conditional permutation test, for testing the conditional independence of variables

X

and

Y

given a potentially high-dimensional random vector

Z

that may contain confounding factors. The proposed test permutes entries of

X

non-uniformly, so as to respect the existing dependence between

X

and

Z

and thus account for the presence of these confounders. Like the conditional randomization test of Cand\`es et al. (2018), our test relies on the availability of an approximation to the distribution of

X \mid Z

. While Cand\`es et al. (2018)'s test uses this estimate to draw new

X

values, for our test we use this approximation to design an appropriate non-uniform distribution on permutations of the

X

values already seen in the true data. We provide an efficient Markov Chain Monte Carlo sampler for the implementation of our method, and establish bounds on the Type I error in terms of the error in the approximation of the conditional distribution of

X\mid Z

, finding that, for the worst case test statistic, the inflation in Type I error of the conditional permutation test is no larger than that of the conditional randomization test. We validate these theoretical results with experiments on simulated data and on the Capital Bikeshare data set.Comment: 31 pages, 4 figure

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository