230 research outputs found
Simple regret for infinitely many armed bandits
We consider a stochastic bandit problem with infinitely many arms. In this
setting, the learner has no chance of trying all the arms even once and has to
dedicate its limited number of samples only to a certain number of arms. All
previous algorithms for this setting were designed for minimizing the
cumulative regret of the learner. In this paper, we propose an algorithm aiming
at minimizing the simple regret. As in the cumulative regret setting of
infinitely many armed bandits, the rate of the simple regret will depend on a
parameter characterizing the distribution of the near-optimal arms. We
prove that depending on , our algorithm is minimax optimal either up to
a multiplicative constant or up to a factor. We also provide
extensions to several important cases: when is unknown, in a natural
setting where the near-optimal arms have a small variance, and in the case of
unknown time horizon.Comment: in 32th International Conference on Machine Learning (ICML 2015
Second-Order Kernel Online Convex Optimization with Adaptive Sketching
Kernel online convex optimization (KOCO) is a framework combining the
expressiveness of non-parametric kernel models with the regret guarantees of
online learning. First-order KOCO methods such as functional gradient descent
require only time and space per iteration, and, when the only
information on the losses is their convexity, achieve a minimax optimal
regret. Nonetheless, many common losses in kernel
problems, such as squared loss, logistic loss, and squared hinge loss posses
stronger curvature that can be exploited. In this case, second-order KOCO
methods achieve regret, which
we show scales as , where
is the effective dimension of the problem and is usually much smaller than
. The main drawback of second-order methods is their
much higher space and time complexity. In this paper, we
introduce kernel online Newton step (KONS), a new second-order KOCO method that
also achieves regret. To address the
computational complexity of second-order methods, we introduce a new matrix
sketching algorithm for the kernel matrix , and show that for
a chosen parameter our Sketched-KONS reduces the space and time
complexity by a factor of to space and
time per iteration, while incurring only times more regret
A simple parameter-free and adaptive approach to optimization under a minimal local smoothness assumption
We study the problem of optimizing a function under a \emph{budgeted number
of evaluations}. We only assume that the function is \emph{locally} smooth
around one of its global optima. The difficulty of optimization is measured in
terms of 1) the amount of \emph{noise} of the function evaluation and 2)
the local smoothness, , of the function. A smaller results in smaller
optimization error. We come with a new, simple, and parameter-free approach.
First, for all values of and , this approach recovers at least the
state-of-the-art regret guarantees. Second, our approach additionally obtains
these results while being \textit{agnostic} to the values of both and .
This leads to the first algorithm that naturally adapts to an \textit{unknown}
range of noise and leads to significant improvements in a moderate and
low-noise regime. Third, our approach also obtains a remarkable improvement
over the state-of-the-art SOO algorithm when the noise is very low which
includes the case of optimization under deterministic feedback (). There,
under our minimal local smoothness assumption, this improvement is of
exponential magnitude and holds for a class of functions that covers the vast
majority of functions that practitioners optimize (). We show that our
algorithmic improvement is borne out in experiments as we empirically show
faster convergence on common benchmarks
Spectral Bandits for Smooth Graph Functions
International audienceSmooth functions on graphs have wide applications in manifold and semi-supervised learning. In this paper, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as content-based recommendation. In this problem, each item we can recommend is a node and its expected rating is similar to its neighbors. The goal is to recommend items that have high expected ratings. We aim for the algorithms where the cumulative regret with respect to the optimal policy would not scale poorly with the number of nodes. In particular, we introduce the notion of an effective dimension, which is small in real-world graphs, and propose two algorithms for solving our problem that scale linearly and sublinearly in this dimension. Our experiments on real-world content recommendation problem show that a good estimator of user preferences for thousands of items can be learned from just tens of nodes evaluations
Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback
We study the online influence maximization problem in social networks under
the independent cascade model. Specifically, we aim to learn the set of "best
influencers" in a social network online while repeatedly interacting with it.
We address the challenges of (i) combinatorial action space, since the number
of feasible influencer sets grows exponentially with the maximum number of
influencers, and (ii) limited feedback, since only the influenced portion of
the network is observed. Under a stochastic semi-bandit feedback, we propose
and analyze IMLinUCB, a computationally efficient UCB-based algorithm. Our
bounds on the cumulative regret are polynomial in all quantities of interest,
achieve near-optimal dependence on the number of interactions and reflect the
topology of the network and the activation probabilities of its edges, thereby
giving insights on the problem complexity. To the best of our knowledge, these
are the first such results. Our experiments show that in several representative
graph topologies, the regret of IMLinUCB scales as suggested by our upper
bounds. IMLinUCB permits linear generalization and thus is both statistically
and computationally suitable for large-scale problems. Our experiments also
show that IMLinUCB with linear generalization can lead to low regret in
real-world online influence maximization.Comment: Compared with the previous version, this version has fixed a mistake.
This version is also consistent with the NIPS camera-ready versio
Distance Metric Learning for Conditional Anomaly Detection
International audienceAnomaly detection methods can be very useful in identifying unusual or interesting patterns in data. A recently proposed conditional anomaly detection framework extends anomaly detection to the problem of identifying anomalous patterns on a subset of attributes in the data. The anomaly always depends (is conditioned) on the value of remaining attributes. The work presented in this paper focuses on instance-based methods for detecting conditional anomalies. The methods depend heavily on the distance metric that lets us identify examples in the dataset that are most critical for detecting the anomaly. To optimize the performance of the anomaly detection methods we explore and study metric learning methods. We evaluate the quality of our methods on the Pneumonia PORT dataset by detecting unusual admission decisions for patients with the community-acquired pneumonia. The results of our metric learning methods show an improved detection performance over standard distance metrics, which is very promising for building automated anomaly detection systems for variety of intelligent monitoring applications
Online combinatorial optimization with stochastic decision sets and adversarial losses
International audienceMost work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can be blocked or goods that are out of stock. In this paper we study learning algorithms that are able to deal with stochastic availability of such unreliable composite actions. We propose and analyze algorithms based on the Follow-The-Perturbed-Leader prediction method for several learning settings differing in the feedback provided to the learner. Our algorithms rely on a novel loss estimation technique that we call Counting Asleep Times. We deliver regret bounds for our algorithms for the previously studied full information and (semi-)bandit settings, as well as a natural middle point between the two that we call the restricted information setting. A special consequence of our results is a significant improvement of the best known performance guarantees achieved by an efficient algorithm for the sleeping bandit problem with stochastic availability. Finally, we evaluate our algorithms empirically and show their improvement over the known approaches
- …