232 research outputs found
Trend Detection based Regret Minimization for Bandit Problems
We study a variation of the classical multi-armed bandits problem. In this
problem, the learner has to make a sequence of decisions, picking from a fixed
set of choices. In each round, she receives as feedback only the loss incurred
from the chosen action. Conventionally, this problem has been studied when
losses of the actions are drawn from an unknown distribution or when they are
adversarial. In this paper, we study this problem when the losses of the
actions also satisfy certain structural properties, and especially, do show a
trend structure. When this is true, we show that using \textit{trend
detection}, we can achieve regret of order with
respect to a switching strategy for the version of the problem where a single
action is chosen in each round and when actions
are chosen each round. This guarantee is a significant improvement over the
conventional benchmark. Our approach can, as a framework, be applied in
combination with various well-known bandit algorithms, like Exp3. For both
versions of the problem, we give regret guarantees also for the
\textit{anytime} setting, i.e. when the length of the choice-sequence is not
known in advance. Finally, we pinpoint the advantages of our method by
comparing it to some well-known other strategies
Sequential Design for Ranking Response Surfaces
We propose and analyze sequential design methods for the problem of ranking
several response surfaces. Namely, given response surfaces over a
continuous input space , the aim is to efficiently find the index of
the minimal response across the entire . The response surfaces are not
known and have to be noisily sampled one-at-a-time. This setting is motivated
by stochastic control applications and requires joint experimental design both
in space and response-index dimensions. To generate sequential design
heuristics we investigate stepwise uncertainty reduction approaches, as well as
sampling based on posterior classification complexity. We also make connections
between our continuous-input formulation and the discrete framework of pure
regret in multi-armed bandits. To model the response surfaces we utilize
kriging surrogates. Several numerical examples using both synthetic data and an
epidemics control problem are provided to illustrate our approach and the
efficacy of respective adaptive designs.Comment: 26 pages, 7 figures (updated several sections and figures
A Bayesian multi-armed bandit algorithm for dynamic end-to-end routing in SDN-based networks with piecewise-stationary rewards
To handle the exponential growth of data-intensive network edge services and automatically solve new challenges in routing management, machine learning is steadily being incorporated into software-defined networking solutions. In this line, the article presents the design of a piecewise-stationary Bayesian multi-armed bandit approach for the online optimum end-to-end dynamic routing of data flows in the context of programmable networking systems. This learning-based approach has been analyzed with simulated and emulated data, showing the proposal’s ability to sequentially and proactively self-discover the end-to-end routing path with minimal delay among a considerable number of alternatives, even when facing abrupt changes in transmission delay distributions due to both variable congestion levels on path network devices and dynamic delays to transmission links.info:eu-repo/semantics/publishedVersio
EMM: Energy-Aware Mobility Management for Mobile Edge Computing in Ultra Dense Networks
Merging mobile edge computing (MEC) functionality with the dense deployment
of base stations (BSs) provides enormous benefits such as a real proximity, low
latency access to computing resources. However, the envisioned integration
creates many new challenges, among which mobility management (MM) is a critical
one. Simply applying existing radio access oriented MM schemes leads to poor
performance mainly due to the co-provisioning of radio access and computing
services of the MEC-enabled BSs. In this paper, we develop a novel user-centric
energy-aware mobility management (EMM) scheme, in order to optimize the delay
due to both radio access and computation, under the long-term energy
consumption constraint of the user. Based on Lyapunov optimization and
multi-armed bandit theories, EMM works in an online fashion without future
system state information, and effectively handles the imperfect system state
information. Theoretical analysis explicitly takes radio handover and
computation migration cost into consideration and proves a bounded deviation on
both the delay performance and energy consumption compared to the oracle
solution with exact and complete future system information. The proposed
algorithm also effectively handles the scenario in which candidate BSs randomly
switch on/off during the offloading process of a task. Simulations show that
the proposed algorithms can achieve close-to-optimal delay performance while
satisfying the user energy consumption constraint.Comment: 14 pages, 6 figures, an extended version of the paper submitted to
IEEE JSA
- …