1,174 research outputs found
Nonparametric Feature Extraction from Dendrograms
We propose feature extraction from dendrograms in a nonparametric way. The
Minimax distance measures correspond to building a dendrogram with single
linkage criterion, with defining specific forms of a level function and a
distance function over that. Therefore, we extend this method to arbitrary
dendrograms. We develop a generalized framework wherein different distance
measures can be inferred from different types of dendrograms, level functions
and distance functions. Via an appropriate embedding, we compute a vector-based
representation of the inferred distances, in order to enable many numerical
machine learning algorithms to employ such distances. Then, to address the
model selection problem, we study the aggregation of different dendrogram-based
distances respectively in solution space and in representation space in the
spirit of deep representations. In the first approach, for example for the
clustering problem, we build a graph with positive and negative edge weights
according to the consistency of the clustering labels of different objects
among different solutions, in the context of ensemble methods. Then, we use an
efficient variant of correlation clustering to produce the final clusters. In
the second approach, we investigate the sequential combination of different
distances and features sequentially in the spirit of multi-layered
architectures to obtain the final features. Finally, we demonstrate the
effectiveness of our approach via several numerical studies
Rodeo: Sparse Nonparametric Regression in High Dimensions
We present a greedy method for simultaneously performing local bandwidth
selection and variable selection in nonparametric regression. The method starts
with a local linear estimator with large bandwidths, and incrementally
decreases the bandwidth of variables for which the gradient of the estimator
with respect to bandwidth is large. The method--called rodeo (regularization of
derivative expectation operator)--conducts a sequence of hypothesis tests to
threshold derivatives, and is easy to implement. Under certain assumptions on
the regression function and sampling density, it is shown that the rodeo
applied to local linear smoothing avoids the curse of dimensionality, achieving
near optimal minimax rates of convergence in the number of relevant variables,
as if these variables were isolated in advance
Sharp analysis of low-rank kernel matrix approximations
We consider supervised learning problems within the positive-definite kernel
framework, such as kernel ridge regression, kernel logistic regression or the
support vector machine. With kernels leading to infinite-dimensional feature
spaces, a common practical limiting difficulty is the necessity of computing
the kernel matrix, which most frequently leads to algorithms with running time
at least quadratic in the number of observations n, i.e., O(n^2). Low-rank
approximations of the kernel matrix are often considered as they allow the
reduction of running time complexities to O(p^2 n), where p is the rank of the
approximation. The practicality of such methods thus depends on the required
rank p. In this paper, we show that in the context of kernel ridge regression,
for approximations based on a random subset of columns of the original kernel
matrix, the rank p may be chosen to be linear in the degrees of freedom
associated with the problem, a quantity which is classically used in the
statistical analysis of such methods, and is often seen as the implicit number
of parameters of non-parametric estimators. This result enables simple
algorithms that have sub-quadratic running time complexity, but provably
exhibit the same predictive performance than existing algorithms, for any given
problem instance, and not only for worst-case situations
Efficient Optimal Learning for Contextual Bandits
We address the problem of learning in an online setting where the learner
repeatedly observes features, selects among a set of actions, and receives
reward for the action taken. We provide the first efficient algorithm with an
optimal regret. Our algorithm uses a cost sensitive classification learner as
an oracle and has a running time , where is the number
of classification rules among which the oracle might choose. This is
exponentially faster than all previous algorithms that achieve optimal regret
in this setting. Our formulation also enables us to create an algorithm with
regret that is additive rather than multiplicative in feedback delay as in all
previous work
Gains and Losses are Fundamentally Different in Regret Minimization: The Sparse Case
We demonstrate that, in the classical non-stochastic regret minimization
problem with decisions, gains and losses to be respectively maximized or
minimized are fundamentally different. Indeed, by considering the additional
sparsity assumption (at each stage, at most decisions incur a nonzero
outcome), we derive optimal regret bounds of different orders. Specifically,
with gains, we obtain an optimal regret guarantee after stages of order
, so the classical dependency in the dimension is replaced by
the sparsity size. With losses, we provide matching upper and lower bounds of
order , which is decreasing in . Eventually, we also
study the bandit setting, and obtain an upper bound of order when outcomes are losses. This bound is proven to be optimal up to the
logarithmic factor
- …