11,518 research outputs found
Learning an Unknown Network State in Routing Games
We study learning dynamics induced by myopic travelers who repeatedly play a
routing game on a transportation network with an unknown state. The state
impacts cost functions of one or more edges of the network. In each stage,
travelers choose their routes according to Wardrop equilibrium based on public
belief of the state. This belief is broadcast by an information system that
observes the edge loads and realized costs on the used edges, and performs a
Bayesian update to the prior stage's belief. We show that the sequence of
public beliefs and edge load vectors generated by the repeated play converge
almost surely. In any rest point, travelers have no incentive to deviate from
the chosen routes and accurately learn the true costs on the used edges.
However, the costs on edges that are not used may not be accurately learned.
Thus, learning can be incomplete in that the edge load vectors at rest point
and complete information equilibrium can be different. We present some
conditions for complete learning and illustrate situations when such an outcome
is not guaranteed
Learning in Real-Time Search: A Unifying Framework
Real-time search methods are suited for tasks in which the agent is
interacting with an initially unknown environment in real time. In such
simultaneous planning and learning problems, the agent has to select its
actions in a limited amount of time, while sensing only a local part of the
environment centered at the agents current location. Real-time heuristic search
agents select actions using a limited lookahead search and evaluating the
frontier states with a heuristic function. Over repeated experiences, they
refine heuristic values of states to avoid infinite loops and to converge to
better solutions. The wide spread of such settings in autonomous software and
hardware agents has led to an explosion of real-time search algorithms over the
last two decades. Not only is a potential user confronted with a hodgepodge of
algorithms, but he also faces the choice of control parameters they use. In
this paper we address both problems. The first contribution is an introduction
of a simple three-parameter framework (named LRTS) which extracts the core
ideas behind many existing algorithms. We then prove that LRTA*, epsilon-LRTA*,
SLA*, and gamma-Trap algorithms are special cases of our framework. Thus, they
are unified and extended with additional features. Second, we prove
completeness and convergence of any algorithm covered by the LRTS framework.
Third, we prove several upper-bounds relating the control parameters and
solution quality. Finally, we analyze the influence of the three control
parameters empirically in the realistic scalable domains of real-time
navigation on initially unknown maps from a commercial role-playing game as
well as routing in ad hoc sensor networks
Distributed Flow Scheduling in an Unknown Environment
Flow scheduling tends to be one of the oldest and most stubborn problems in
networking. It becomes more crucial in the next generation network, due to fast
changing link states and tremendous cost to explore the global structure. In
such situation, distributed algorithms often dominate. In this paper, we design
a distributed virtual game to solve the flow scheduling problem and then
generalize it to situations of unknown environment, where online learning
schemes are utilized. In the virtual game, we use incentives to stimulate
selfish users to reach a Nash Equilibrium Point which is valid based on the
analysis of the `Price of Anarchy'. In the unknown-environment generalization,
our ultimate goal is the minimization of cost in the long run. In order to
achieve balance between exploration of routing cost and exploitation based on
limited information, we model this problem based on Multi-armed Bandit Scenario
and combined newly proposed DSEE with the virtual game design. Armed with these
powerful tools, we find a totally distributed algorithm to ensure the
logarithmic growing of regret with time, which is optimum in classic
Multi-armed Bandit Problem. Theoretical proof and simulation results both
affirm this claim. To our knowledge, this is the first research to combine
multi-armed bandit with distributed flow scheduling.Comment: 10 pages, 3 figures, conferenc
Wardrop Equilibrium in Discrete-Time Selfish Routing with Time-Varying Bounded Delays
This paper presents a multi-commodity, discrete-
time, distributed and non-cooperative routing algorithm, which is
proved to converge to an equilibrium in the presence of
heterogeneous, unknown, time-varying but bounded delays.
Under mild assumptions on the latency functions which describe
the cost associated to the network paths, two algorithms are
proposed: the former assumes that each commodity relies only on
measurements of the latencies associated to its own paths; the
latter assumes that each commodity has (at least indirectly) access
to the measures of the latencies of all the network paths. Both
algorithms are proven to drive the system state to an invariant set
which approximates and contains the Wardrop equilibrium,
defined as a network state in which no traffic flow over the
network paths can improve its routing unilaterally, with the latter
achieving a better reconstruction of the Wardrop equilibrium.
Numerical simulations show the effectiveness of the proposed
approach
Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey
Wireless sensor networks (WSNs) consist of autonomous and resource-limited
devices. The devices cooperate to monitor one or more physical phenomena within
an area of interest. WSNs operate as stochastic systems because of randomness
in the monitored environments. For long service time and low maintenance cost,
WSNs require adaptive and robust methods to address data exchange, topology
formulation, resource and power optimization, sensing coverage and object
detection, and security challenges. In these problems, sensor nodes are to make
optimized decisions from a set of accessible strategies to achieve design
goals. This survey reviews numerous applications of the Markov decision process
(MDP) framework, a powerful decision-making tool to develop adaptive algorithms
and protocols for WSNs. Furthermore, various solution methods are discussed and
compared to serve as a guide for using MDPs in WSNs
Multi-Layer Cyber-Physical Security and Resilience for Smart Grid
The smart grid is a large-scale complex system that integrates communication
technologies with the physical layer operation of the energy systems. Security
and resilience mechanisms by design are important to provide guarantee
operations for the system. This chapter provides a layered perspective of the
smart grid security and discusses game and decision theory as a tool to model
the interactions among system components and the interaction between attackers
and the system. We discuss game-theoretic applications and challenges in the
design of cross-layer robust and resilient controller, secure network routing
protocol at the data communication and networking layers, and the challenges of
the information security at the management layer of the grid. The chapter will
discuss the future directions of using game-theoretic tools in addressing
multi-layer security issues in the smart grid.Comment: 16 page
- …