1,364 research outputs found
Minimum Density Hyperplanes
Associating distinct groups of objects (clusters) with contiguous regions of
high probability density (high-density clusters), is central to many
statistical and machine learning approaches to the classification of unlabelled
data. We propose a novel hyperplane classifier for clustering and
semi-supervised classification which is motivated by this objective. The
proposed minimum density hyperplane minimises the integral of the empirical
probability density function along it, thereby avoiding intersection with high
density clusters. We show that the minimum density and the maximum margin
hyperplanes are asymptotically equivalent, thus linking this approach to
maximum margin clustering and semi-supervised support vector classifiers. We
propose a projection pursuit formulation of the associated optimisation problem
which allows us to find minimum density hyperplanes efficiently in practice,
and evaluate its performance on a range of benchmark datasets. The proposed
approach is found to be very competitive with state of the art methods for
clustering and semi-supervised classification
Distinguishing cause from effect using observational data: methods and benchmarks
The discovery of causal relationships from purely observational data is a
fundamental problem in science. The most elementary form of such a causal
discovery problem is to decide whether X causes Y or, alternatively, Y causes
X, given joint observations of two variables X, Y. An example is to decide
whether altitude causes temperature, or vice versa, given only joint
measurements of both variables. Even under the simplifying assumptions of no
confounding, no feedback loops, and no selection bias, such bivariate causal
discovery problems are challenging. Nevertheless, several approaches for
addressing those problems have been proposed in recent years. We review two
families of such methods: Additive Noise Methods (ANM) and Information
Geometric Causal Inference (IGCI). We present the benchmark CauseEffectPairs
that consists of data for 100 different cause-effect pairs selected from 37
datasets from various domains (e.g., meteorology, biology, medicine,
engineering, economy, etc.) and motivate our decisions regarding the "ground
truth" causal directions of all pairs. We evaluate the performance of several
bivariate causal discovery methods on these real-world benchmark data and in
addition on artificially simulated data. Our empirical results on real-world
data indicate that certain methods are indeed able to distinguish cause from
effect using only purely observational data, although more benchmark data would
be needed to obtain statistically significant conclusions. One of the best
performing methods overall is the additive-noise method originally proposed by
Hoyer et al. (2009), which obtains an accuracy of 63+-10 % and an AUC of
0.74+-0.05 on the real-world benchmark. As the main theoretical contribution of
this work we prove the consistency of that method.Comment: 101 pages, second revision submitted to Journal of Machine Learning
Researc
Guaranteed bounds on the Kullback-Leibler divergence of univariate mixtures using piecewise log-sum-exp inequalities
Information-theoretic measures such as the entropy, cross-entropy and the
Kullback-Leibler divergence between two mixture models is a core primitive in
many signal processing tasks. Since the Kullback-Leibler divergence of mixtures
provably does not admit a closed-form formula, it is in practice either
estimated using costly Monte-Carlo stochastic integration, approximated, or
bounded using various techniques. We present a fast and generic method that
builds algorithmically closed-form lower and upper bounds on the entropy, the
cross-entropy and the Kullback-Leibler divergence of mixtures. We illustrate
the versatile method by reporting on our experiments for approximating the
Kullback-Leibler divergence between univariate exponential mixtures, Gaussian
mixtures, Rayleigh mixtures, and Gamma mixtures.Comment: 20 pages, 3 figure
Drawing Valid Targeted Inference When Covariate-adjusted Response-adaptive RCT Meets Data-adaptive Loss-based Estimation, With An Application To The LASSO
Adaptive clinical trial design methods have garnered growing attention in the recent years, in large part due to their greater flexibility over their traditional counterparts. One such design is the so-called covariate-adjusted, response-adaptive (CARA) randomized controlled trial (RCT). In a CARA RCT, the treatment randomization schemes are allowed to depend on the patient’s pre-treatment covariates, and the investigators have the opportunity to adjust these schemes during the course of the trial based on accruing information (including previous responses), in order to meet a pre-specified optimality criterion, while preserving the validity of the trial in learning its primary study parameter.
In this article, we propose a new group-sequential CARA RCT design and corresponding analytical procedure that admits the use of flexible data-adaptive techniques. The proposed design framework can target general adaption optimality criteria that may not have a closed-form solution, thanks to a loss- based approach in defining and estimating the unknown optimal randomization scheme. Both in predicting the conditional response and in constructing the treatment randomization schemes, this framework uses loss-based data-adaptive estimation over general classes of functions (which may change with sample size). Since the randomization adaptation is response-adaptive, this innovative flexibility potentially translates into more effective adaptation towards the optimality criterion. To target the primary study parameter, the proposed analytical method provides robust inference of the parameter, despite arbitrarily mis-specified response models, under the most general settings.
Specifically, we establish that, under appropriate entropy conditions on the classes of functions, the resulting sequence of randomization schemes converges to a fixed scheme, and the proposed treatment effect estimator is consistent (even under a mis-specified response model), asymptotically Gaussian, and gives rise to valid confidence intervals of given asymptotic levels. Moreover, the limiting randomization scheme coincides with the unknown optimal randomization scheme when, simultaneously, the response model is correctly specified and the optimal scheme belongs to the limit of the user-supplied classes of randomization schemes. We illustrate the applicability of these general theoretical results with a LASSO- based CARA RCT. In this example, both the response model and the optimal treatment randomization are estimated using a sequence of LASSO logistic models that may increase with sample size. It follows immediately from our general theorems that this LASSO-based CARA RCT converges to a fixed design and yields consistent and asymptotically Gaussian effect estimates, under minimal conditions on the smoothness of the basis functions in the LASSO logistic models. We exemplify the proposed methods with a simulation study
Exact Non-Parametric Bayesian Inference on Infinite Trees
Given i.i.d. data from an unknown distribution, we consider the problem of
predicting future items. An adaptive way to estimate the probability density is
to recursively subdivide the domain to an appropriate data-dependent
granularity. A Bayesian would assign a data-independent prior probability to
"subdivide", which leads to a prior over infinite(ly many) trees. We derive an
exact, fast, and simple inference algorithm for such a prior, for the data
evidence, the predictive distribution, the effective model dimension, moments,
and other quantities. We prove asymptotic convergence and consistency results,
and illustrate the behavior of our model on some prototypical functions.Comment: 32 LaTeX pages, 9 figures, 5 theorems, 1 algorith
Computational barriers in minimax submatrix detection
This paper studies the minimax detection of a small submatrix of elevated
mean in a large matrix contaminated by additive Gaussian noise. To investigate
the tradeoff between statistical performance and computational cost from a
complexity-theoretic perspective, we consider a sequence of discretized models
which are asymptotically equivalent to the Gaussian model. Under the hypothesis
that the planted clique detection problem cannot be solved in randomized
polynomial time when the clique size is of smaller order than the square root
of the graph size, the following phase transition phenomenon is established:
when the size of the large matrix , if the submatrix size
for any , computational complexity
constraints can incur a severe penalty on the statistical performance in the
sense that any randomized polynomial-time test is minimax suboptimal by a
polynomial factor in ; if for any
, minimax optimal detection can be attained within
constant factors in linear time. Using Schatten norm loss as a representative
example, we show that the hardness of attaining the minimax estimation rate can
crucially depend on the loss function. Implications on the hardness of support
recovery are also obtained.Comment: Published at http://dx.doi.org/10.1214/14-AOS1300 in the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …