11,243 research outputs found

    Stochastic Online Shortest Path Routing: The Value of Feedback

    Full text link
    This paper studies online shortest path routing over multi-hop networks. Link costs or delays are time-varying and modeled by independent and identically distributed random processes, whose parameters are initially unknown. The parameters, and hence the optimal path, can only be estimated by routing packets through the network and observing the realized delays. Our aim is to find a routing policy that minimizes the regret (the cumulative difference of expected delay) between the path chosen by the policy and the unknown optimal path. We formulate the problem as a combinatorial bandit optimization problem and consider several scenarios that differ in where routing decisions are made and in the information available when making the decisions. For each scenario, we derive a tight asymptotic lower bound on the regret that has to be satisfied by any online routing policy. These bounds help us to understand the performance improvements we can expect when (i) taking routing decisions at each hop rather than at the source only, and (ii) observing per-link delays rather than end-to-end path delays. In particular, we show that (i) is of no use while (ii) can have a spectacular impact. Three algorithms, with a trade-off between computational complexity and performance, are proposed. The regret upper bounds of these algorithms improve over those of the existing algorithms, and they significantly outperform state-of-the-art algorithms in numerical experiments.Comment: 18 page

    Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems

    Full text link
    In the Multi-Armed Bandit (MAB) problem, there is a given set of arms with unknown reward models. At each time, a player selects one arm to play, aiming to maximize the total expected reward over a horizon of length T. An approach based on a Deterministic Sequencing of Exploration and Exploitation (DSEE) is developed for constructing sequential arm selection policies. It is shown that for all light-tailed reward distributions, DSEE achieves the optimal logarithmic order of the regret, where regret is defined as the total expected reward loss against the ideal case with known reward models. For heavy-tailed reward distributions, DSEE achieves O(T^1/p) regret when the moments of the reward distributions exist up to the pth order for 1<p<=2 and O(T^1/(1+p/2)) for p>2. With the knowledge of an upperbound on a finite moment of the heavy-tailed reward distributions, DSEE offers the optimal logarithmic regret order. The proposed DSEE approach complements existing work on MAB by providing corresponding results for general reward distributions. Furthermore, with a clearly defined tunable parameter-the cardinality of the exploration sequence, the DSEE approach is easily extendable to variations of MAB, including MAB with various objectives, decentralized MAB with multiple players and incomplete reward observations under collisions, MAB with unknown Markov dynamics, and combinatorial MAB with dependent arms that often arise in network optimization problems such as the shortest path, the minimum spanning, and the dominating set problems under unknown random weights.Comment: 22 pages, 2 figure

    Local Tomography of Large Networks under the Low-Observability Regime

    Full text link
    This article studies the problem of reconstructing the topology of a network of interacting agents via observations of the state-evolution of the agents. We focus on the large-scale network setting with the additional constraint of partialpartial observations, where only a small fraction of the agents can be feasibly observed. The goal is to infer the underlying subnetwork of interactions and we refer to this problem as locallocal tomographytomography. In order to study the large-scale setting, we adopt a proper stochastic formulation where the unobserved part of the network is modeled as an Erd\"{o}s-R\'enyi random graph, while the observable subnetwork is left arbitrary. The main result of this work is establishing that, under this setting, local tomography is actually possible with high probability, provided that certain conditions on the network model are met (such as stability and symmetry of the network combination matrix). Remarkably, such conclusion is established under the lowlow-observabilityobservability regimeregime, where the cardinality of the observable subnetwork is fixed, while the size of the overall network scales to infinity.Comment: To appear in IEEE Transactions on Information Theor

    Distributed Flow Scheduling in an Unknown Environment

    Full text link
    Flow scheduling tends to be one of the oldest and most stubborn problems in networking. It becomes more crucial in the next generation network, due to fast changing link states and tremendous cost to explore the global structure. In such situation, distributed algorithms often dominate. In this paper, we design a distributed virtual game to solve the flow scheduling problem and then generalize it to situations of unknown environment, where online learning schemes are utilized. In the virtual game, we use incentives to stimulate selfish users to reach a Nash Equilibrium Point which is valid based on the analysis of the `Price of Anarchy'. In the unknown-environment generalization, our ultimate goal is the minimization of cost in the long run. In order to achieve balance between exploration of routing cost and exploitation based on limited information, we model this problem based on Multi-armed Bandit Scenario and combined newly proposed DSEE with the virtual game design. Armed with these powerful tools, we find a totally distributed algorithm to ensure the logarithmic growing of regret with time, which is optimum in classic Multi-armed Bandit Problem. Theoretical proof and simulation results both affirm this claim. To our knowledge, this is the first research to combine multi-armed bandit with distributed flow scheduling.Comment: 10 pages, 3 figures, conferenc

    Online Learning of Energy Consumption for Navigation of Electric Vehicles

    Get PDF
    Energy efficient navigation constitutes an important challenge in electric vehicles, due to their limited battery capacity. We employ a Bayesian approach to model the energy consumption at road segments for efficient navigation. In order to learn the model parameters, we develop an online learning framework and investigate several exploration strategies such as Thompson Sampling and Upper Confidence Bound. We then extend our online learning framework to the multi-agent setting, where multiple vehicles adaptively navigate and learn the parameters of the energy model. We analyze Thompson Sampling and establish rigorous regret bounds on its performance in the single-agent and multi-agent settings, through an analysis of the algorithm under batched feedback. Finally, we demonstrate the performance of our methods via experiments on several real-world city road networks

    Online Learning of Energy Consumption for Navigation of Electric Vehicles

    Get PDF
    Energy efficient navigation constitutes an important challenge in electric vehicles, due to their limited battery capacity. We employ a Bayesian approach to model the energy consumption at road segments for efficient navigation. In order to learn the model parameters, we develop an online learning framework and investigate several exploration strategies such as Thompson Sampling and Upper Confidence Bound. We then extend our online learning framework to the multi-agent setting, where multiple vehicles adaptively navigate and learn the parameters of the energy model. We analyze Thompson Sampling and establish rigorous regret bounds on its performance in the single-agent and multi-agent settings, through an analysis of the algorithm under batched feedback. Finally, we demonstrate the performance of our methods via experiments on several real-world city road networks.Comment: Extension of arXiv:2003.0141
    • …
    corecore