280 research outputs found
Improved Second-Order Bounds for Prediction with Expert Advice
This work studies external regret in sequential prediction games with both
positive and negative payoffs. External regret measures the difference between
the payoff obtained by the forecasting strategy and the payoff of the best
action. In this setting, we derive new and sharper regret bounds for the
well-known exponentially weighted average forecaster and for a new forecaster
with a different multiplicative update rule. Our analysis has two main
advantages: first, no preliminary knowledge about the payoff sequence is
needed, not even its range; second, our bounds are expressed in terms of sums
of squared payoffs, replacing larger first-order quantities appearing in
previous bounds. In addition, our most refined bounds have the natural and
desirable property of being stable under rescalings and general translations of
the payoff sequence
An efficient algorithm for learning with semi-bandit feedback
We consider the problem of online combinatorial optimization under
semi-bandit feedback. The goal of the learner is to sequentially select its
actions from a combinatorial decision set so as to minimize its cumulative
loss. We propose a learning algorithm for this problem based on combining the
Follow-the-Perturbed-Leader (FPL) prediction method with a novel loss
estimation procedure called Geometric Resampling (GR). Contrary to previous
solutions, the resulting algorithm can be efficiently implemented for any
decision set where efficient offline combinatorial optimization is possible at
all. Assuming that the elements of the decision set can be described with
d-dimensional binary vectors with at most m non-zero entries, we show that the
expected regret of our algorithm after T rounds is O(m sqrt(dT log d)). As a
side result, we also improve the best known regret bounds for FPL in the full
information setting to O(m^(3/2) sqrt(T log d)), gaining a factor of sqrt(d/m)
over previous bounds for this algorithm.Comment: submitted to ALT 201
Nonstochastic Multiarmed Bandits with Unrestricted Delays
We investigate multiarmed bandits with delayed feedback, where the delays need neither be identical nor bounded. We first prove that "delayed" Exp3 achieves the regret bound conjectured by Cesa-Bianchi et al. [2016] in the case of variable, but bounded delays. Here, is the number of actions and is the total delay over rounds. We then introduce a new algorithm that lifts the requirement of bounded delays by using a wrapper that skips rounds with excessively large delays. The new algorithm maintains the same regret bound, but similar to its predecessor requires prior knowledge of and . For this algorithm we then construct a novel doubling scheme that forgoes the prior knowledge requirement under the assumption that the delays are available at action time (rather than at loss observation time). This assumption is satisfied in a broad range of applications, including interaction with servers and service providers. The resulting oracle regret bound is of order , where is the number of observations with delay exceeding , and is the total delay of observations with delay below . The bound relaxes to , but we also provide examples where and the oracle bound has a polynomially better dependence on the problem parameters
Revisiting the Core Ontology and Problem in Requirements Engineering
In their seminal paper in the ACM Transactions on Software Engineering and
Methodology, Zave and Jackson established a core ontology for Requirements
Engineering (RE) and used it to formulate the "requirements problem", thereby
defining what it means to successfully complete RE. Given that stakeholders of
the system-to-be communicate the information needed to perform RE, we show that
Zave and Jackson's ontology is incomplete. It does not cover all types of basic
concerns that the stakeholders communicate. These include beliefs, desires,
intentions, and attitudes. In response, we propose a core ontology that covers
these concerns and is grounded in sound conceptual foundations resting on a
foundational ontology. The new core ontology for RE leads to a new formulation
of the requirements problem that extends Zave and Jackson's formulation. We
thereby establish new standards for what minimum information should be
represented in RE languages and new criteria for determining whether RE has
been successfully completed.Comment: Appears in the proceedings of the 16th IEEE International
Requirements Engineering Conference, 2008 (RE'08). Best paper awar
Competing with stationary prediction strategies
In this paper we introduce the class of stationary prediction strategies and
construct a prediction algorithm that asymptotically performs as well as the
best continuous stationary strategy. We make mild compactness assumptions but
no stochastic assumptions about the environment. In particular, no assumption
of stationarity is made about the environment, and the stationarity of the
considered strategies only means that they do not depend explicitly on time; we
argue that it is natural to consider only stationary strategies even for highly
non-stationary environments.Comment: 20 page
Statistical Mechanics of Linear and Nonlinear Time-Domain Ensemble Learning
Conventional ensemble learning combines students in the space domain. In this
paper, however, we combine students in the time domain and call it time-domain
ensemble learning. We analyze, compare, and discuss the generalization
performances regarding time-domain ensemble learning of both a linear model and
a nonlinear model. Analyzing in the framework of online learning using a
statistical mechanical method, we show the qualitatively different behaviors
between the two models. In a linear model, the dynamical behaviors of the
generalization error are monotonic. We analytically show that time-domain
ensemble learning is twice as effective as conventional ensemble learning.
Furthermore, the generalization error of a nonlinear model features
nonmonotonic dynamical behaviors when the learning rate is small. We
numerically show that the generalization performance can be improved remarkably
by using this phenomenon and the divergence of students in the time domain.Comment: 11 pages, 7 figure
Prediction with Expert Advice under Discounted Loss
We study prediction with expert advice in the setting where the losses are
accumulated with some discounting---the impact of old losses may gradually
vanish. We generalize the Aggregating Algorithm and the Aggregating Algorithm
for Regression to this case, propose a suitable new variant of exponential
weights algorithm, and prove respective loss bounds.Comment: 26 pages; expanded (2 remarks -> theorems), some misprints correcte
Delay and Cooperation in Nonstochastic Bandits
We study networks of communicating learning agents that cooperate to solve a common nonstochastic bandit problem. Agents use an underlying communication network to get messages about actions selected by other agents, and drop messages that took more than d hops to arrive, where d is a delay parameter. We introduce Exp3-Coop, a cooperative version of the Exp3 algorithm and prove that with K actions and N agents the average per-agent regret after T rounds is at most of order root(d + 1 + K/N alpha(<= d)) (T ln K), where alpha(<= d) is the independence number of the d-th power of the communication graph G. We then show that for any connected graph, for d = root K the regret bound is K-1/4 root T, strictly better than the minimax regret root KT for noncooperating agents. More informed choices of d lead to bounds which are arbitrarily close to the full information minimax regret root T ln K when G is dense. When G has sparse components, we show that a variant of Exp3-Coop, allowing agents to choose their parameters according to their centrality in G, strictly improves the regret. Finally, as a by-product of our analysis, we provide the first characterization of the minimax regret for bandit learning with dela
- …