Search CORE

2,656 research outputs found

k-NN Regression Adapts to Local Intrinsic Dimension

Author: Kpotufe Samory
Publication venue
Publication date: 19/10/2011
Field of study

Many nonparametric regressors were recently shown to converge at rates that depend only on the intrinsic dimension of data. These regressors thus escape the curse of dimension when high-dimensional data has low intrinsic dimension (e.g. a manifold). We show that k-NN regression is also adaptive to intrinsic dimension. In particular our rates are local to a query x and depend only on the way masses of balls centered at x vary with radius. Furthermore, we show a simple way to choose k = k(x) locally at any x so as to nearly achieve the minimax rate at x in terms of the unknown intrinsic dimension in the vicinity of x. We also establish that the minimax rate does not depend on a particular choice of metric space or distribution, but rather that this minimax rate holds for any metric space and doubling measure

arXiv.org e-Print Archive

CiteSeerX

MPG.PuRe

Non-Asymptotic Uniform Rates of Consistency for k-NN Regression

Author: Jiang Heinrich
Publication venue
Publication date: 02/11/2018
Field of study

We derive high-probability finite-sample uniform rates of consistency for

k

-NN regression that are optimal up to logarithmic factors under mild assumptions. We moreover show that

k

-NN regression adapts to an unknown lower intrinsic dimension automatically. We then apply the

k

-NN regression rates to establish new results about estimating the level sets and global maxima of a function from noisy observations.Comment: In Proceedings of 33rd AAAI Conference on Artificial Intelligence (AAAI 2019

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

The intrinsic value of HFO features as a biomarker of epileptic activity

Author: Gliske Stephen V.
Hero III Alfred O.
Moon Kevin R.
Stacey William C.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/10/2015
Field of study

High frequency oscillations (HFOs) are a promising biomarker of epileptic brain tissue and activity. HFOs additionally serve as a prototypical example of challenges in the analysis of discrete events in high-temporal resolution, intracranial EEG data. Two primary challenges are 1) dimensionality reduction, and 2) assessing feasibility of classification. Dimensionality reduction assumes that the data lie on a manifold with dimension less than that of the feature space. However, previous HFO analyses have assumed a linear manifold, global across time, space (i.e. recording electrode/channel), and individual patients. Instead, we assess both a) whether linear methods are appropriate and b) the consistency of the manifold across time, space, and patients. We also estimate bounds on the Bayes classification error to quantify the distinction between two classes of HFOs (those occurring during seizures and those occurring due to other processes). This analysis provides the foundation for future clinical use of HFO features and buides the analysis for other discrete events, such as individual action potentials or multi-unit activity.Comment: 5 pages, 5 figure

arXiv.org e-Print Archive

Crossref

Global and Local Two-Sample Tests via Regression

Author: Kim Ilmun
Lee Ann B.
Lei Jing
Publication venue
Publication date: 01/01/2019
Field of study

Two-sample testing is a fundamental problem in statistics. Despite its long history, there has been renewed interest in this problem with the advent of high-dimensional and complex data. Specifically, in the machine learning literature, there have been recent methodological developments such as classification accuracy tests. The goal of this work is to present a regression approach to comparing multivariate distributions of complex data. Depending on the chosen regression model, our framework can efficiently handle different types of variables and various structures in the data, with competitive power under many practical scenarios. Whereas previous work has been largely limited to global tests which conceal much of the local information, our approach naturally leads to a local two-sample testing framework in which we identify local differences between multivariate distributions with statistical confidence. We demonstrate the efficacy of our approach both theoretically and empirically, under some well-known parametric and nonparametric regression methods. Our proposed methods are applied to simulated data as well as a challenging astronomy data set to assess their practical usefulness

arXiv.org e-Print Archive

Crossref

The ABACOC Algorithm: a Novel Approach for Nonparametric Classification of Data Streams

Author: Cesa-Bianchi Nicolò
De Rosa Rocco
Orabona Francesco
Publication venue
Publication date: 20/08/2015
Field of study

Stream mining poses unique challenges to machine learning: predictive models are required to be scalable, incrementally trainable, must remain bounded in size (even when the data stream is arbitrarily long), and be nonparametric in order to achieve high accuracy even in complex and dynamic environments. Moreover, the learning system must be parameterless ---traditional tuning methods are problematic in streaming settings--- and avoid requiring prior knowledge of the number of distinct class labels occurring in the stream. In this paper, we introduce a new algorithmic approach for nonparametric learning in data streams. Our approach addresses all above mentioned challenges by learning a model that covers the input space using simple local classifiers. The distribution of these classifiers dynamically adapts to the local (unknown) complexity of the classification problem, thus achieving a good balance between model complexity and predictive accuracy. We design four variants of our approach of increasing adaptivity. By means of an extensive empirical evaluation against standard nonparametric baselines, we show state-of-the-art results in terms of accuracy versus model size. For the variant that imposes a strict bound on the model size, we show better performance against all other methods measured at the same model size value. Our empirical analysis is complemented by a theoretical performance guarantee which does not rely on any stochastic assumption on the source generating the stream

arXiv.org e-Print Archive

Crossref

AIR Universita degli studi di Milano