11,243 research outputs found
Stochastic Online Shortest Path Routing: The Value of Feedback
This paper studies online shortest path routing over multi-hop networks. Link
costs or delays are time-varying and modeled by independent and identically
distributed random processes, whose parameters are initially unknown. The
parameters, and hence the optimal path, can only be estimated by routing
packets through the network and observing the realized delays. Our aim is to
find a routing policy that minimizes the regret (the cumulative difference of
expected delay) between the path chosen by the policy and the unknown optimal
path. We formulate the problem as a combinatorial bandit optimization problem
and consider several scenarios that differ in where routing decisions are made
and in the information available when making the decisions. For each scenario,
we derive a tight asymptotic lower bound on the regret that has to be satisfied
by any online routing policy. These bounds help us to understand the
performance improvements we can expect when (i) taking routing decisions at
each hop rather than at the source only, and (ii) observing per-link delays
rather than end-to-end path delays. In particular, we show that (i) is of no
use while (ii) can have a spectacular impact. Three algorithms, with a
trade-off between computational complexity and performance, are proposed. The
regret upper bounds of these algorithms improve over those of the existing
algorithms, and they significantly outperform state-of-the-art algorithms in
numerical experiments.Comment: 18 page
Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems
In the Multi-Armed Bandit (MAB) problem, there is a given set of arms with
unknown reward models. At each time, a player selects one arm to play, aiming
to maximize the total expected reward over a horizon of length T. An approach
based on a Deterministic Sequencing of Exploration and Exploitation (DSEE) is
developed for constructing sequential arm selection policies. It is shown that
for all light-tailed reward distributions, DSEE achieves the optimal
logarithmic order of the regret, where regret is defined as the total expected
reward loss against the ideal case with known reward models. For heavy-tailed
reward distributions, DSEE achieves O(T^1/p) regret when the moments of the
reward distributions exist up to the pth order for 1<p<=2 and O(T^1/(1+p/2))
for p>2. With the knowledge of an upperbound on a finite moment of the
heavy-tailed reward distributions, DSEE offers the optimal logarithmic regret
order. The proposed DSEE approach complements existing work on MAB by providing
corresponding results for general reward distributions. Furthermore, with a
clearly defined tunable parameter-the cardinality of the exploration sequence,
the DSEE approach is easily extendable to variations of MAB, including MAB with
various objectives, decentralized MAB with multiple players and incomplete
reward observations under collisions, MAB with unknown Markov dynamics, and
combinatorial MAB with dependent arms that often arise in network optimization
problems such as the shortest path, the minimum spanning, and the dominating
set problems under unknown random weights.Comment: 22 pages, 2 figure
Local Tomography of Large Networks under the Low-Observability Regime
This article studies the problem of reconstructing the topology of a network
of interacting agents via observations of the state-evolution of the agents. We
focus on the large-scale network setting with the additional constraint of
observations, where only a small fraction of the agents can be
feasibly observed. The goal is to infer the underlying subnetwork of
interactions and we refer to this problem as . In order to
study the large-scale setting, we adopt a proper stochastic formulation where
the unobserved part of the network is modeled as an Erd\"{o}s-R\'enyi random
graph, while the observable subnetwork is left arbitrary. The main result of
this work is establishing that, under this setting, local tomography is
actually possible with high probability, provided that certain conditions on
the network model are met (such as stability and symmetry of the network
combination matrix). Remarkably, such conclusion is established under the
- , where the cardinality of the observable
subnetwork is fixed, while the size of the overall network scales to infinity.Comment: To appear in IEEE Transactions on Information Theor
Distributed Flow Scheduling in an Unknown Environment
Flow scheduling tends to be one of the oldest and most stubborn problems in
networking. It becomes more crucial in the next generation network, due to fast
changing link states and tremendous cost to explore the global structure. In
such situation, distributed algorithms often dominate. In this paper, we design
a distributed virtual game to solve the flow scheduling problem and then
generalize it to situations of unknown environment, where online learning
schemes are utilized. In the virtual game, we use incentives to stimulate
selfish users to reach a Nash Equilibrium Point which is valid based on the
analysis of the `Price of Anarchy'. In the unknown-environment generalization,
our ultimate goal is the minimization of cost in the long run. In order to
achieve balance between exploration of routing cost and exploitation based on
limited information, we model this problem based on Multi-armed Bandit Scenario
and combined newly proposed DSEE with the virtual game design. Armed with these
powerful tools, we find a totally distributed algorithm to ensure the
logarithmic growing of regret with time, which is optimum in classic
Multi-armed Bandit Problem. Theoretical proof and simulation results both
affirm this claim. To our knowledge, this is the first research to combine
multi-armed bandit with distributed flow scheduling.Comment: 10 pages, 3 figures, conferenc
Online Learning of Energy Consumption for Navigation of Electric Vehicles
Energy efficient navigation constitutes an important challenge in electric vehicles, due to their limited battery capacity. We employ a Bayesian approach to model the energy consumption at road segments for efficient navigation. In order to learn the model parameters, we develop an online learning framework and investigate several exploration strategies such as Thompson Sampling and Upper Confidence Bound. We then extend our online learning framework to the multi-agent setting, where multiple vehicles adaptively navigate and learn the parameters of the energy model. We analyze Thompson Sampling and establish rigorous regret bounds on its performance in the single-agent and multi-agent settings, through an analysis of the algorithm under batched feedback. Finally, we demonstrate the performance of our methods via experiments on several real-world city road networks
Online Learning of Energy Consumption for Navigation of Electric Vehicles
Energy efficient navigation constitutes an important challenge in electric
vehicles, due to their limited battery capacity. We employ a Bayesian approach
to model the energy consumption at road segments for efficient navigation. In
order to learn the model parameters, we develop an online learning framework
and investigate several exploration strategies such as Thompson Sampling and
Upper Confidence Bound. We then extend our online learning framework to the
multi-agent setting, where multiple vehicles adaptively navigate and learn the
parameters of the energy model. We analyze Thompson Sampling and establish
rigorous regret bounds on its performance in the single-agent and multi-agent
settings, through an analysis of the algorithm under batched feedback. Finally,
we demonstrate the performance of our methods via experiments on several
real-world city road networks.Comment: Extension of arXiv:2003.0141
- …