378 research outputs found
k-NN Regression Adapts to Local Intrinsic Dimension
Many nonparametric regressors were recently shown to converge at rates that
depend only on the intrinsic dimension of data. These regressors thus escape
the curse of dimension when high-dimensional data has low intrinsic dimension
(e.g. a manifold). We show that k-NN regression is also adaptive to intrinsic
dimension. In particular our rates are local to a query x and depend only on
the way masses of balls centered at x vary with radius.
Furthermore, we show a simple way to choose k = k(x) locally at any x so as
to nearly achieve the minimax rate at x in terms of the unknown intrinsic
dimension in the vicinity of x. We also establish that the minimax rate does
not depend on a particular choice of metric space or distribution, but rather
that this minimax rate holds for any metric space and doubling measure
The ABACOC Algorithm: a Novel Approach for Nonparametric Classification of Data Streams
Stream mining poses unique challenges to machine learning: predictive models
are required to be scalable, incrementally trainable, must remain bounded in
size (even when the data stream is arbitrarily long), and be nonparametric in
order to achieve high accuracy even in complex and dynamic environments.
Moreover, the learning system must be parameterless ---traditional tuning
methods are problematic in streaming settings--- and avoid requiring prior
knowledge of the number of distinct class labels occurring in the stream. In
this paper, we introduce a new algorithmic approach for nonparametric learning
in data streams. Our approach addresses all above mentioned challenges by
learning a model that covers the input space using simple local classifiers.
The distribution of these classifiers dynamically adapts to the local (unknown)
complexity of the classification problem, thus achieving a good balance between
model complexity and predictive accuracy. We design four variants of our
approach of increasing adaptivity. By means of an extensive empirical
evaluation against standard nonparametric baselines, we show state-of-the-art
results in terms of accuracy versus model size. For the variant that imposes a
strict bound on the model size, we show better performance against all other
methods measured at the same model size value. Our empirical analysis is
complemented by a theoretical performance guarantee which does not rely on any
stochastic assumption on the source generating the stream
Towards meta-learning for multi-target regression problems
Several multi-target regression methods were devel-oped in the last years
aiming at improving predictive performanceby exploring inter-target correlation
within the problem. However, none of these methods outperforms the others for
all problems. This motivates the development of automatic approachesto
recommend the most suitable multi-target regression method. In this paper, we
propose a meta-learning system to recommend the best predictive method for a
given multi-target regression problem. We performed experiments with a
meta-dataset generated by a total of 648 synthetic datasets. These datasets
were created to explore distinct inter-targets characteristics toward
recommending the most promising method. In experiments, we evaluated four
different algorithms with different biases as meta-learners. Our meta-dataset
is composed of 58 meta-features, based on: statistical information, correlation
characteristics, linear landmarking, from the distribution and smoothness of
the data, and has four different meta-labels. Results showed that induced
meta-models were able to recommend the best methodfor different base level
datasets with a balanced accuracy superior to 70% using a Random Forest
meta-model, which statistically outperformed the meta-learning baselines.Comment: To appear on the 8th Brazilian Conference on Intelligent Systems
(BRACIS
- …