11,518 research outputs found

    Learning an Unknown Network State in Routing Games

    Full text link
    We study learning dynamics induced by myopic travelers who repeatedly play a routing game on a transportation network with an unknown state. The state impacts cost functions of one or more edges of the network. In each stage, travelers choose their routes according to Wardrop equilibrium based on public belief of the state. This belief is broadcast by an information system that observes the edge loads and realized costs on the used edges, and performs a Bayesian update to the prior stage's belief. We show that the sequence of public beliefs and edge load vectors generated by the repeated play converge almost surely. In any rest point, travelers have no incentive to deviate from the chosen routes and accurately learn the true costs on the used edges. However, the costs on edges that are not used may not be accurately learned. Thus, learning can be incomplete in that the edge load vectors at rest point and complete information equilibrium can be different. We present some conditions for complete learning and illustrate situations when such an outcome is not guaranteed

    Learning in Real-Time Search: A Unifying Framework

    Full text link
    Real-time search methods are suited for tasks in which the agent is interacting with an initially unknown environment in real time. In such simultaneous planning and learning problems, the agent has to select its actions in a limited amount of time, while sensing only a local part of the environment centered at the agents current location. Real-time heuristic search agents select actions using a limited lookahead search and evaluating the frontier states with a heuristic function. Over repeated experiences, they refine heuristic values of states to avoid infinite loops and to converge to better solutions. The wide spread of such settings in autonomous software and hardware agents has led to an explosion of real-time search algorithms over the last two decades. Not only is a potential user confronted with a hodgepodge of algorithms, but he also faces the choice of control parameters they use. In this paper we address both problems. The first contribution is an introduction of a simple three-parameter framework (named LRTS) which extracts the core ideas behind many existing algorithms. We then prove that LRTA*, epsilon-LRTA*, SLA*, and gamma-Trap algorithms are special cases of our framework. Thus, they are unified and extended with additional features. Second, we prove completeness and convergence of any algorithm covered by the LRTS framework. Third, we prove several upper-bounds relating the control parameters and solution quality. Finally, we analyze the influence of the three control parameters empirically in the realistic scalable domains of real-time navigation on initially unknown maps from a commercial role-playing game as well as routing in ad hoc sensor networks

    Distributed Flow Scheduling in an Unknown Environment

    Full text link
    Flow scheduling tends to be one of the oldest and most stubborn problems in networking. It becomes more crucial in the next generation network, due to fast changing link states and tremendous cost to explore the global structure. In such situation, distributed algorithms often dominate. In this paper, we design a distributed virtual game to solve the flow scheduling problem and then generalize it to situations of unknown environment, where online learning schemes are utilized. In the virtual game, we use incentives to stimulate selfish users to reach a Nash Equilibrium Point which is valid based on the analysis of the `Price of Anarchy'. In the unknown-environment generalization, our ultimate goal is the minimization of cost in the long run. In order to achieve balance between exploration of routing cost and exploitation based on limited information, we model this problem based on Multi-armed Bandit Scenario and combined newly proposed DSEE with the virtual game design. Armed with these powerful tools, we find a totally distributed algorithm to ensure the logarithmic growing of regret with time, which is optimum in classic Multi-armed Bandit Problem. Theoretical proof and simulation results both affirm this claim. To our knowledge, this is the first research to combine multi-armed bandit with distributed flow scheduling.Comment: 10 pages, 3 figures, conferenc

    Wardrop Equilibrium in Discrete-Time Selfish Routing with Time-Varying Bounded Delays

    Get PDF
    This paper presents a multi-commodity, discrete- time, distributed and non-cooperative routing algorithm, which is proved to converge to an equilibrium in the presence of heterogeneous, unknown, time-varying but bounded delays. Under mild assumptions on the latency functions which describe the cost associated to the network paths, two algorithms are proposed: the former assumes that each commodity relies only on measurements of the latencies associated to its own paths; the latter assumes that each commodity has (at least indirectly) access to the measures of the latencies of all the network paths. Both algorithms are proven to drive the system state to an invariant set which approximates and contains the Wardrop equilibrium, defined as a network state in which no traffic flow over the network paths can improve its routing unilaterally, with the latter achieving a better reconstruction of the Wardrop equilibrium. Numerical simulations show the effectiveness of the proposed approach

    Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey

    Full text link
    Wireless sensor networks (WSNs) consist of autonomous and resource-limited devices. The devices cooperate to monitor one or more physical phenomena within an area of interest. WSNs operate as stochastic systems because of randomness in the monitored environments. For long service time and low maintenance cost, WSNs require adaptive and robust methods to address data exchange, topology formulation, resource and power optimization, sensing coverage and object detection, and security challenges. In these problems, sensor nodes are to make optimized decisions from a set of accessible strategies to achieve design goals. This survey reviews numerous applications of the Markov decision process (MDP) framework, a powerful decision-making tool to develop adaptive algorithms and protocols for WSNs. Furthermore, various solution methods are discussed and compared to serve as a guide for using MDPs in WSNs

    Multi-Layer Cyber-Physical Security and Resilience for Smart Grid

    Full text link
    The smart grid is a large-scale complex system that integrates communication technologies with the physical layer operation of the energy systems. Security and resilience mechanisms by design are important to provide guarantee operations for the system. This chapter provides a layered perspective of the smart grid security and discusses game and decision theory as a tool to model the interactions among system components and the interaction between attackers and the system. We discuss game-theoretic applications and challenges in the design of cross-layer robust and resilient controller, secure network routing protocol at the data communication and networking layers, and the challenges of the information security at the management layer of the grid. The chapter will discuss the future directions of using game-theoretic tools in addressing multi-layer security issues in the smart grid.Comment: 16 page
    corecore