2,656 research outputs found
k-NN Regression Adapts to Local Intrinsic Dimension
Many nonparametric regressors were recently shown to converge at rates that
depend only on the intrinsic dimension of data. These regressors thus escape
the curse of dimension when high-dimensional data has low intrinsic dimension
(e.g. a manifold). We show that k-NN regression is also adaptive to intrinsic
dimension. In particular our rates are local to a query x and depend only on
the way masses of balls centered at x vary with radius.
Furthermore, we show a simple way to choose k = k(x) locally at any x so as
to nearly achieve the minimax rate at x in terms of the unknown intrinsic
dimension in the vicinity of x. We also establish that the minimax rate does
not depend on a particular choice of metric space or distribution, but rather
that this minimax rate holds for any metric space and doubling measure
Non-Asymptotic Uniform Rates of Consistency for k-NN Regression
We derive high-probability finite-sample uniform rates of consistency for
-NN regression that are optimal up to logarithmic factors under mild
assumptions. We moreover show that -NN regression adapts to an unknown lower
intrinsic dimension automatically. We then apply the -NN regression rates to
establish new results about estimating the level sets and global maxima of a
function from noisy observations.Comment: In Proceedings of 33rd AAAI Conference on Artificial Intelligence
(AAAI 2019
The intrinsic value of HFO features as a biomarker of epileptic activity
High frequency oscillations (HFOs) are a promising biomarker of epileptic
brain tissue and activity. HFOs additionally serve as a prototypical example of
challenges in the analysis of discrete events in high-temporal resolution,
intracranial EEG data. Two primary challenges are 1) dimensionality reduction,
and 2) assessing feasibility of classification. Dimensionality reduction
assumes that the data lie on a manifold with dimension less than that of the
feature space. However, previous HFO analyses have assumed a linear manifold,
global across time, space (i.e. recording electrode/channel), and individual
patients. Instead, we assess both a) whether linear methods are appropriate and
b) the consistency of the manifold across time, space, and patients. We also
estimate bounds on the Bayes classification error to quantify the distinction
between two classes of HFOs (those occurring during seizures and those
occurring due to other processes). This analysis provides the foundation for
future clinical use of HFO features and buides the analysis for other discrete
events, such as individual action potentials or multi-unit activity.Comment: 5 pages, 5 figure
Global and Local Two-Sample Tests via Regression
Two-sample testing is a fundamental problem in statistics. Despite its long
history, there has been renewed interest in this problem with the advent of
high-dimensional and complex data. Specifically, in the machine learning
literature, there have been recent methodological developments such as
classification accuracy tests. The goal of this work is to present a regression
approach to comparing multivariate distributions of complex data. Depending on
the chosen regression model, our framework can efficiently handle different
types of variables and various structures in the data, with competitive power
under many practical scenarios. Whereas previous work has been largely limited
to global tests which conceal much of the local information, our approach
naturally leads to a local two-sample testing framework in which we identify
local differences between multivariate distributions with statistical
confidence. We demonstrate the efficacy of our approach both theoretically and
empirically, under some well-known parametric and nonparametric regression
methods. Our proposed methods are applied to simulated data as well as a
challenging astronomy data set to assess their practical usefulness
The ABACOC Algorithm: a Novel Approach for Nonparametric Classification of Data Streams
Stream mining poses unique challenges to machine learning: predictive models
are required to be scalable, incrementally trainable, must remain bounded in
size (even when the data stream is arbitrarily long), and be nonparametric in
order to achieve high accuracy even in complex and dynamic environments.
Moreover, the learning system must be parameterless ---traditional tuning
methods are problematic in streaming settings--- and avoid requiring prior
knowledge of the number of distinct class labels occurring in the stream. In
this paper, we introduce a new algorithmic approach for nonparametric learning
in data streams. Our approach addresses all above mentioned challenges by
learning a model that covers the input space using simple local classifiers.
The distribution of these classifiers dynamically adapts to the local (unknown)
complexity of the classification problem, thus achieving a good balance between
model complexity and predictive accuracy. We design four variants of our
approach of increasing adaptivity. By means of an extensive empirical
evaluation against standard nonparametric baselines, we show state-of-the-art
results in terms of accuracy versus model size. For the variant that imposes a
strict bound on the model size, we show better performance against all other
methods measured at the same model size value. Our empirical analysis is
complemented by a theoretical performance guarantee which does not rely on any
stochastic assumption on the source generating the stream
- …