8,306 research outputs found
Recommended from our members
An MDP-Based Approach to Online Mechanism Design
Online mechanism design (MD) considers the problem of providing incentives to implement desired system-wide outcomes in systems with self-interested agents that arrive and depart dynamically. Agents can choose to misrepresent their arrival and departure times, in addition to information about their value for di erent outcomes. We consider the problem of maximizing the total long-term value of the system despite the self-interest of agents. The online MD problem induces a Markov Decision Process (MDP), which when solved can be used to implement optimal policies in a truth-revealing Bayesian-Nash equilibrium.Engineering and Applied Science
Recommended from our members
Approximately Efficient Online Mechanism Design
Online mechanism design (OMD) addresses the problem of sequential decision making in a stochastic environment with multiple self-interested agents. The goal in OMD is to make value-maximizing decisions despite this self-interest. In previous work we presented a Markov decision process (MDP)-based approach to OMD in large-scale problem domains. In practice the underlying MDP needed to solve OMD is too large and hence the mechanism must consider approximations. This raises the possibility that agents may be able to exploit the approximation for selfish gain. We adopt sparse-sampling-based MDP algorithms to implement efficient policies, and retain truth-revelation as an approximate Bayesian-Nash equilibrium. Our approach is empirically illustrated in the context of the dynamic allocation of WiFi connectivity to users in a coffeehouse.Engineering and Applied Science
Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey
Wireless sensor networks (WSNs) consist of autonomous and resource-limited
devices. The devices cooperate to monitor one or more physical phenomena within
an area of interest. WSNs operate as stochastic systems because of randomness
in the monitored environments. For long service time and low maintenance cost,
WSNs require adaptive and robust methods to address data exchange, topology
formulation, resource and power optimization, sensing coverage and object
detection, and security challenges. In these problems, sensor nodes are to make
optimized decisions from a set of accessible strategies to achieve design
goals. This survey reviews numerous applications of the Markov decision process
(MDP) framework, a powerful decision-making tool to develop adaptive algorithms
and protocols for WSNs. Furthermore, various solution methods are discussed and
compared to serve as a guide for using MDPs in WSNs
A Survey on Delay-Aware Resource Control for Wireless Systems --- Large Deviation Theory, Stochastic Lyapunov Drift and Distributed Stochastic Learning
In this tutorial paper, a comprehensive survey is given on several major
systematic approaches in dealing with delay-aware control problems, namely the
equivalent rate constraint approach, the Lyapunov stability drift approach and
the approximate Markov Decision Process (MDP) approach using stochastic
learning. These approaches essentially embrace most of the existing literature
regarding delay-aware resource control in wireless systems. They have their
relative pros and cons in terms of performance, complexity and implementation
issues. For each of the approaches, the problem setup, the general solution and
the design methodology are discussed. Applications of these approaches to
delay-aware resource allocation are illustrated with examples in single-hop
wireless networks. Furthermore, recent results regarding delay-aware multi-hop
routing designs in general multi-hop networks are elaborated. Finally, the
delay performance of the various approaches are compared through simulations
using an example of the uplink OFDMA systems.Comment: 58 pages, 8 figures; IEEE Transactions on Information Theory, 201
An Online Decision-Theoretic Pipeline for Responder Dispatch
The problem of dispatching emergency responders to service traffic accidents,
fire, distress calls and crimes plagues urban areas across the globe. While
such problems have been extensively looked at, most approaches are offline.
Such methodologies fail to capture the dynamically changing environments under
which critical emergency response occurs, and therefore, fail to be implemented
in practice. Any holistic approach towards creating a pipeline for effective
emergency response must also look at other challenges that it subsumes -
predicting when and where incidents happen and understanding the changing
environmental dynamics. We describe a system that collectively deals with all
these problems in an online manner, meaning that the models get updated with
streaming data sources. We highlight why such an approach is crucial to the
effectiveness of emergency response, and present an algorithmic framework that
can compute promising actions for a given decision-theoretic model for
responder dispatch. We argue that carefully crafted heuristic measures can
balance the trade-off between computational time and the quality of solutions
achieved and highlight why such an approach is more scalable and tractable than
traditional approaches. We also present an online mechanism for incident
prediction, as well as an approach based on recurrent neural networks for
learning and predicting environmental features that affect responder dispatch.
We compare our methodology with prior state-of-the-art and existing dispatch
strategies in the field, which show that our approach results in a reduction in
response time with a drastic reduction in computational time.Comment: Appeared in ICCPS 201
Deception in Optimal Control
In this paper, we consider an adversarial scenario where one agent seeks to
achieve an objective and its adversary seeks to learn the agent's intentions
and prevent the agent from achieving its objective. The agent has an incentive
to try to deceive the adversary about its intentions, while at the same time
working to achieve its objective. The primary contribution of this paper is to
introduce a mathematically rigorous framework for the notion of deception
within the context of optimal control. The central notion introduced in the
paper is that of a belief-induced reward: a reward dependent not only on the
agent's state and action, but also adversary's beliefs. Design of an optimal
deceptive strategy then becomes a question of optimal control design on the
product of the agent's state space and the adversary's belief space. The
proposed framework allows for deception to be defined in an arbitrary control
system endowed with a reward function, as well as with additional
specifications limiting the agent's control policy. In addition to defining
deception, we discuss design of optimally deceptive strategies under
uncertainties in agent's knowledge about the adversary's learning process. In
the latter part of the paper, we focus on a setting where the agent's behavior
is governed by a Markov decision process, and show that the design of optimally
deceptive strategies under lack of knowledge about the adversary naturally
reduces to previously discussed problems in control design on partially
observable or uncertain Markov decision processes. Finally, we present two
examples of deceptive strategies: a "cops and robbers" scenario and an example
where an agent may use camouflage while moving. We show that optimally
deceptive strategies in such examples follow the intuitive idea of how to
deceive an adversary in the above settings
- …