Search CORE

39 research outputs found

Modeling Human Performance in Restless Bandits with Particle Filters

Author
Publication venue: 'Purdue University (bepress)'
Publication date
Field of study

Uncertainty and Exploration in a Restless Bandit Problem

Author: Acuna
Ahn
Behrens
Berry
Busemeyer
Cohen
Daw
Erev
Gittins
Gluck
Granmo
Gupta
Kalman
Kalman
Kim
Knox
Luce
Papadimitriou
Steyvers
Sutton
Thompson
Tversky
Viappiani
Wagenmakers
Whittle
Yechiam
Yi
Publication venue: 'Wiley'
Publication date: 21/04/2015
Field of study

Decision making in noisy and changing environments requires a fine balance between exploiting knowledge about good courses of action and exploring the environment in order to improve upon this knowledge. We present an experiment on a restless bandit task in which participants made repeated choices between options for which the average rewards changed over time. Comparing a number of computational models of participants' behavior in this task, we find evidence that a substantial number of them balanced exploration and exploitation by considering the probability that an option offers the maximum reward out of all the available options

Crossref

UCL Discovery

Structure Learning in Human Sequential Decision-Making

Author: A Fel'dbaum
A Gelman
A Johnson
A Smith
AC Courville
AD Horowitz
AJ Yu
C Anderson
C Watkins
D Acuna
D Heckerman
DA Braun
Daniel E. Acuña
I Erev
J Anderson
J Banks
JB Tenenbaum
JB Tenenbaum
JC Gittins
JC Gittins
L Kaelbling
M Steyvers
M Steyvers
MD Lee
MJA Strens
MS Yi
N Gans
ND Daw
P Poupart
P Whittle
Paul Schrater
R Dearden
R Howard
RE Bellman
RE Bellman
RE Neapolitan
RJ Meyer
RS Sutton
SJ Gershman
TEJ Behrens
Tim Behrens
W Edwards
W Edwards
W Schultz
W Schultz
Y Brackbill
Y Sakai
Y Sakai
Publication venue: Public Library of Science
Publication date: 01/12/2010
Field of study

Studies of sequential decision-making in humans frequently find suboptimal performance relative to an ideal actor that has perfect knowledge of the model of how rewards and events are generated in the environment. Rather than being suboptimal, we argue that the learning problem humans face is more complex, in that it also involves learning the structure of reward generation in the environment. We formulate the problem of structure learning in sequential decision tasks using Bayesian reinforcement learning, and show that learning the generative model for rewards qualitatively changes the behavior of an optimal learning agent. To test whether people exhibit structure learning, we performed experiments involving a mixture of one-armed and two-armed bandit reward models, where structure learning produces many of the qualitative behaviors deemed suboptimal in previous studies. Our results demonstrate humans can perform structure learning in a near-optimal manner

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Performance optimization for unmanned vehicle systems

Author: Le Ny Jerome
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2008
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2008.Includes bibliographical references (p. 149-157).Technological advances in the area of unmanned vehicles are opening new possibilities for creating teams of vehicles performing complex missions with some degree of autonomy. Perhaps the most spectacular example of these advances concerns the increasing deployment of unmanned aerial vehicles (UAVs) in military operations. Unmanned Vehicle Systems (UVS) are mainly used in Information, Surveillance and Reconnaissance missions (ISR). In this context, the vehicles typically move about a low-threat environment which is sufficiently simple to be modeled successfully. This thesis develops tools for optimizing the performance of UVS performing ISR missions, assuming such a model.First, in a static environment, the UVS operator typically requires that a vehicle visit a set of waypoints once or repetitively, with no a priori specified order. Minimizing the length of the tour traveled by the vehicle through these waypoints requires solving a Traveling Salesman Problem (TSP). We study the TSP for the Dubins' vehicle, which models the limited turning radius of fixed wing UAVs. In contrast to previously proposed approaches, our algorithms determine an ordering of the waypoints that depends on the model of the vehicle dynamics. We evaluate the performance gains obtained by incorporating such a model in the mission planner.With a dynamic model of the environment the decision making level of the UVS also needs to solve a sensor scheduling problem. We consider M UAVs monitoring N > M sites with independent Markovian dynamics, and treat two important examples arising in this and other contexts, such as wireless channel or radar waveform selection. In the first example, the sensors must detect events arising at sites modeled as two-state Markov chains. In the second example, the sites are assumed to be Gaussian linear time invariant (LTI) systems and the sensors must keep the best possible estimate of the state of each site.(cont.) We first present a bound on the achievable performance which can be computed efficiently by a convex program, involving linear matrix inequalities in the LTI case. We give closed-form formulas for a feedback index policy proposed by Whittle. Comparing the performance of this policy to the bound, it is seen to perform very well in simulations. For the LTI example, we propose new open-loop periodic switching policies whose performance matches the bound.Ultimately, we need to solve the task scheduling and motion planning problems simultaneously. We first extend the approach developed for the sensor scheduling problems to the case where switching penalties model the path planning component. Finally, we propose a new modeling approach, based on fluid models for stochastic networks, to obtain insight into more complex spatiotemporal resource allocation problems. In particular, we give a necessary and sufficient stabilizability condition for the fluid approximation of the problem of harvesting data from a set of spatially distributed queues with spatially varying transmission rates using a mobile server.by Jerome Le Ny.Ph.D

CiteSeerX

DSpace@MIT

PolyPublie

Advanced Sensor and Dynamics Models with an Application to Sensor Management

Author: Wolfgang Koch
Publication venue: 'IntechOpen'
Publication date: 01/02/2009
Field of study

IntechOpen

Towards Thompson Sampling for Complex Bayesian Reasoning

Author: Glimsdal Sondre
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Paper III, IV, and VI are not available as a part of the dissertation due to the copyright.Thompson Sampling (TS) is a state-of-art algorithm for bandit problems set in a Bayesian framework. Both the theoretical foundation and the empirical efficiency of TS is wellexplored for plain bandit problems. However, the Bayesian underpinning of TS means that TS could potentially be applied to other, more complex, problems as well, beyond the bandit problem, if suitable Bayesian structures can be found. The objective of this thesis is the development and analysis of TS-based schemes for more complex optimization problems, founded on Bayesian reasoning. We address several complex optimization problems where the previous state-of-art relies on a relatively myopic perspective on the problem. These includes stochastic searching on the line, the Goore game, the knapsack problem, travel time estimation, and equipartitioning. Instead of employing Bayesian reasoning to obtain a solution, they rely on carefully engineered rules. In all brevity, we recast each of these optimization problems in a Bayesian framework, introducing dedicated TS based solution schemes. For all of the addressed problems, the results show that besides being more effective, the TS based approaches we introduce are also capable of solving more adverse versions of the problems, such as dealing with stochastic liars.publishedVersio

Agder University Research Archive

Thirty Years of Machine Learning: The Road to Pareto-Optimal Wireless Networks

Author: Chen Kwang-Cheng
Hanzo Lajos
Jiang Chunxiao
Ren Yong
Wang Jingjing
Zhang Haijun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/01/2019
Field of study

Future wireless networks have a substantial potential in terms of supporting a broad range of complex compelling applications both in military and civilian fields, where the users are able to enjoy high-rate, low-latency, low-cost and reliable information services. Achieving this ambitious goal requires new radio techniques for adaptive learning and intelligent decision making because of the complex heterogeneous nature of the network structures and wireless services. Machine learning (ML) algorithms have great success in supporting big data analytics, efficient parameter estimation and interactive decision making. Hence, in this article, we review the thirty-year history of ML by elaborating on supervised learning, unsupervised learning, reinforcement learning and deep learning. Furthermore, we investigate their employment in the compelling applications of wireless networks, including heterogeneous networks (HetNets), cognitive radios (CR), Internet of things (IoT), machine to machine networks (M2M), and so on. This article aims for assisting the readers in clarifying the motivation and methodology of the various ML algorithms, so as to invoke them for hitherto unexplored services as well as scenarios of future wireless networks.Comment: 46 pages, 22 fig

arXiv.org e-Print Archive

Southampton (e-Prints Soton)