Search CORE

1,174 research outputs found

Nonparametric Feature Extraction from Dendrograms

Author: Chehreghani Morteza Haghir
Chehreghani Mostafa Haghir
Publication venue
Publication date: 18/11/2019
Field of study

We propose feature extraction from dendrograms in a nonparametric way. The Minimax distance measures correspond to building a dendrogram with single linkage criterion, with defining specific forms of a level function and a distance function over that. Therefore, we extend this method to arbitrary dendrograms. We develop a generalized framework wherein different distance measures can be inferred from different types of dendrograms, level functions and distance functions. Via an appropriate embedding, we compute a vector-based representation of the inferred distances, in order to enable many numerical machine learning algorithms to employ such distances. Then, to address the model selection problem, we study the aggregation of different dendrogram-based distances respectively in solution space and in representation space in the spirit of deep representations. In the first approach, for example for the clustering problem, we build a graph with positive and negative edge weights according to the consistency of the clustering labels of different objects among different solutions, in the context of ensemble methods. Then, we use an efficient variant of correlation clustering to produce the final clusters. In the second approach, we investigate the sequential combination of different distances and features sequentially in the spirit of multi-layered architectures to obtain the final features. Finally, we demonstrate the effectiveness of our approach via several numerical studies

arXiv.org e-Print Archive

Rodeo: Sparse Nonparametric Regression in High Dimensions

Author: Lafferty John
Wasserman Larry
Publication venue
Publication date: 01/01/2005
Field of study

We present a greedy method for simultaneously performing local bandwidth selection and variable selection in nonparametric regression. The method starts with a local linear estimator with large bandwidths, and incrementally decreases the bandwidth of variables for which the gradient of the estimator with respect to bandwidth is large. The method--called rodeo (regularization of derivative expectation operator)--conducts a sequence of hypothesis tests to threshold derivatives, and is easy to implement. Under certain assumptions on the regression function and sampling density, it is shown that the rodeo applied to local linear smoothing avoids the curse of dimensionality, achieving near optimal minimax rates of convergence in the number of relevant variables, as if these variables were isolated in advance

arXiv.org e-Print Archive

CiteSeerX

Sharp analysis of low-rank kernel matrix approximations

Author: Bach Francis
Publication venue
Publication date: 01/01/2013
Field of study

We consider supervised learning problems within the positive-definite kernel framework, such as kernel ridge regression, kernel logistic regression or the support vector machine. With kernels leading to infinite-dimensional feature spaces, a common practical limiting difficulty is the necessity of computing the kernel matrix, which most frequently leads to algorithms with running time at least quadratic in the number of observations n, i.e., O(n^2). Low-rank approximations of the kernel matrix are often considered as they allow the reduction of running time complexities to O(p^2 n), where p is the rank of the approximation. The practicality of such methods thus depends on the required rank p. In this paper, we show that in the context of kernel ridge regression, for approximations based on a random subset of columns of the original kernel matrix, the rank p may be chosen to be linear in the degrees of freedom associated with the problem, a quantity which is classically used in the statistical analysis of such methods, and is often seen as the implicit number of parameters of non-parametric estimators. This result enables simple algorithms that have sub-quadratic running time complexity, but provably exhibit the same predictive performance than existing algorithms, for any given problem instance, and not only for worst-case situations

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

Efficient Optimal Learning for Contextual Bandits

Author: Dudik Miroslav
Hsu Daniel
Kale Satyen
Karampatziakis Nikos
Langford John
Reyzin Lev
Zhang Tong
Publication venue
Publication date: 01/01/2011
Field of study

We address the problem of learning in an online setting where the learner repeatedly observes features, selects among a set of actions, and receives reward for the action taken. We provide the first efficient algorithm with an optimal regret. Our algorithm uses a cost sensitive classification learner as an oracle and has a running time

\mathrm{polylog}(N)

, where

N

is the number of classification rules among which the oracle might choose. This is exponentially faster than all previous algorithms that achieve optimal regret in this setting. Our formulation also enables us to create an algorithm with regret that is additive rather than multiplicative in feedback delay as in all previous work

arXiv.org e-Print Archive

CiteSeerX

Gains and Losses are Fundamentally Different in Regret Minimization: The Sparse Case

Author: Kwon Joon
Perchet Vianney
Publication venue
Publication date: 26/11/2015
Field of study

We demonstrate that, in the classical non-stochastic regret minimization problem with

d

decisions, gains and losses to be respectively maximized or minimized are fundamentally different. Indeed, by considering the additional sparsity assumption (at each stage, at most

s

decisions incur a nonzero outcome), we derive optimal regret bounds of different orders. Specifically, with gains, we obtain an optimal regret guarantee after

T

stages of order

\sqrt{T\log s}

, so the classical dependency in the dimension is replaced by the sparsity size. With losses, we provide matching upper and lower bounds of order

\sqrt{Ts\log(d)/d}

, which is decreasing in

d

. Eventually, we also study the bandit setting, and obtain an upper bound of order

\sqrt{Ts\log (d/s)}

when outcomes are losses. This bound is proven to be optimal up to the logarithmic factor

\sqrt{\log(d/s)}

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot