2,598 research outputs found
Assessing the disclosure protection provided by misclassification for survey microdata
Government statistical agencies often apply statistical disclosure limitation techniques to survey microdata to protect confidentiality. There is a need for ways to assess the protection provided. This paper develops some simple methods for disclosure limitation techniques which perturb the values of categorical identifying variables. The methods are applied in numerical experiments based upon census data from the United Kingdom which are subject to two perturbation techniques: data swapping and the post randomisation method. Some simplifying approximations to the measure of risk are found to work well in capturing the impacts of these techniques. These approximations provide simple extensions of existing risk assessment methods based upon Poisson log-linear models. A numerical experiment is also undertaken to assess the impact of multivariate misclassification with an increasing number of identifying variables. The methods developed in this paper may also be used to obtain more realistic assessments of risk which take account of the kinds of measurement and other non-sampling errors commonly arising in surveys
Sampling and Inference for Beta Neutral-to-the-Left Models of Sparse Networks
Empirical evidence suggests that heavy-tailed degree distributions occurring
in many real networks are well-approximated by power laws with exponents
that may take values either less than and greater than two. Models based on
various forms of exchangeability are able to capture power laws with , and admit tractable inference algorithms; we draw on previous results to
show that cannot be generated by the forms of exchangeability used
in existing random graph models. Preferential attachment models generate power
law exponents greater than two, but have been of limited use as statistical
models due to the inherent difficulty of performing inference in
non-exchangeable models. Motivated by this gap, we design and implement
inference algorithms for a recently proposed class of models that generates
of all possible values. We show that although they are not exchangeable,
these models have probabilistic structure amenable to inference. Our methods
make a large class of previously intractable models useful for statistical
inference.Comment: Accepted for publication in the proceedings of Conference on
Uncertainty in Artificial Intelligence (UAI) 201
Rule-based Machine Learning Methods for Functional Prediction
We describe a machine learning method for predicting the value of a
real-valued function, given the values of multiple input variables. The method
induces solutions from samples in the form of ordered disjunctive normal form
(DNF) decision rules. A central objective of the method and representation is
the induction of compact, easily interpretable solutions. This rule-based
decision model can be extended to search efficiently for similar cases prior to
approximating function values. Experimental results on real-world data
demonstrate that the new techniques are competitive with existing machine
learning and statistical methods and can sometimes yield superior regression
performance.Comment: See http://www.jair.org/ for any accompanying file
Applying Deep Learning To Airbnb Search
The application to search ranking is one of the biggest machine learning
success stories at Airbnb. Much of the initial gains were driven by a gradient
boosted decision tree model. The gains, however, plateaued over time. This
paper discusses the work done in applying neural networks in an attempt to
break out of that plateau. We present our perspective not with the intention of
pushing the frontier of new modeling techniques. Instead, ours is a story of
the elements we found useful in applying neural networks to a real life
product. Deep learning was steep learning for us. To other teams embarking on
similar journeys, we hope an account of our struggles and triumphs will provide
some useful pointers. Bon voyage!Comment: 8 page
The conditional permutation test for independence while controlling for confounders
We propose a general new method, the conditional permutation test, for
testing the conditional independence of variables and given a
potentially high-dimensional random vector that may contain confounding
factors. The proposed test permutes entries of non-uniformly, so as to
respect the existing dependence between and and thus account for the
presence of these confounders. Like the conditional randomization test of
Cand\`es et al. (2018), our test relies on the availability of an approximation
to the distribution of . While Cand\`es et al. (2018)'s test uses
this estimate to draw new values, for our test we use this approximation to
design an appropriate non-uniform distribution on permutations of the
values already seen in the true data. We provide an efficient Markov Chain
Monte Carlo sampler for the implementation of our method, and establish bounds
on the Type I error in terms of the error in the approximation of the
conditional distribution of , finding that, for the worst case test
statistic, the inflation in Type I error of the conditional permutation test is
no larger than that of the conditional randomization test. We validate these
theoretical results with experiments on simulated data and on the Capital
Bikeshare data set.Comment: 31 pages, 4 figure
- …