39 research outputs found

    Modeling Human Performance in Restless Bandits with Particle Filters

    Full text link

    Uncertainty and Exploration in a Restless Bandit Problem

    Get PDF
    Decision making in noisy and changing environments requires a fine balance between exploiting knowledge about good courses of action and exploring the environment in order to improve upon this knowledge. We present an experiment on a restless bandit task in which participants made repeated choices between options for which the average rewards changed over time. Comparing a number of computational models of participants' behavior in this task, we find evidence that a substantial number of them balanced exploration and exploitation by considering the probability that an option offers the maximum reward out of all the available options

    Structure Learning in Human Sequential Decision-Making

    Get PDF
    Studies of sequential decision-making in humans frequently find suboptimal performance relative to an ideal actor that has perfect knowledge of the model of how rewards and events are generated in the environment. Rather than being suboptimal, we argue that the learning problem humans face is more complex, in that it also involves learning the structure of reward generation in the environment. We formulate the problem of structure learning in sequential decision tasks using Bayesian reinforcement learning, and show that learning the generative model for rewards qualitatively changes the behavior of an optimal learning agent. To test whether people exhibit structure learning, we performed experiments involving a mixture of one-armed and two-armed bandit reward models, where structure learning produces many of the qualitative behaviors deemed suboptimal in previous studies. Our results demonstrate humans can perform structure learning in a near-optimal manner

    Performance optimization for unmanned vehicle systems

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2008.Includes bibliographical references (p. 149-157).Technological advances in the area of unmanned vehicles are opening new possibilities for creating teams of vehicles performing complex missions with some degree of autonomy. Perhaps the most spectacular example of these advances concerns the increasing deployment of unmanned aerial vehicles (UAVs) in military operations. Unmanned Vehicle Systems (UVS) are mainly used in Information, Surveillance and Reconnaissance missions (ISR). In this context, the vehicles typically move about a low-threat environment which is sufficiently simple to be modeled successfully. This thesis develops tools for optimizing the performance of UVS performing ISR missions, assuming such a model.First, in a static environment, the UVS operator typically requires that a vehicle visit a set of waypoints once or repetitively, with no a priori specified order. Minimizing the length of the tour traveled by the vehicle through these waypoints requires solving a Traveling Salesman Problem (TSP). We study the TSP for the Dubins' vehicle, which models the limited turning radius of fixed wing UAVs. In contrast to previously proposed approaches, our algorithms determine an ordering of the waypoints that depends on the model of the vehicle dynamics. We evaluate the performance gains obtained by incorporating such a model in the mission planner.With a dynamic model of the environment the decision making level of the UVS also needs to solve a sensor scheduling problem. We consider M UAVs monitoring N > M sites with independent Markovian dynamics, and treat two important examples arising in this and other contexts, such as wireless channel or radar waveform selection. In the first example, the sensors must detect events arising at sites modeled as two-state Markov chains. In the second example, the sites are assumed to be Gaussian linear time invariant (LTI) systems and the sensors must keep the best possible estimate of the state of each site.(cont.) We first present a bound on the achievable performance which can be computed efficiently by a convex program, involving linear matrix inequalities in the LTI case. We give closed-form formulas for a feedback index policy proposed by Whittle. Comparing the performance of this policy to the bound, it is seen to perform very well in simulations. For the LTI example, we propose new open-loop periodic switching policies whose performance matches the bound.Ultimately, we need to solve the task scheduling and motion planning problems simultaneously. We first extend the approach developed for the sensor scheduling problems to the case where switching penalties model the path planning component. Finally, we propose a new modeling approach, based on fluid models for stochastic networks, to obtain insight into more complex spatiotemporal resource allocation problems. In particular, we give a necessary and sufficient stabilizability condition for the fluid approximation of the problem of harvesting data from a set of spatially distributed queues with spatially varying transmission rates using a mobile server.by Jerome Le Ny.Ph.D

    Advanced Sensor and Dynamics Models with an Application to Sensor Management

    Get PDF

    Towards Thompson Sampling for Complex Bayesian Reasoning

    Get PDF
    Paper III, IV, and VI are not available as a part of the dissertation due to the copyright.Thompson Sampling (TS) is a state-of-art algorithm for bandit problems set in a Bayesian framework. Both the theoretical foundation and the empirical efficiency of TS is wellexplored for plain bandit problems. However, the Bayesian underpinning of TS means that TS could potentially be applied to other, more complex, problems as well, beyond the bandit problem, if suitable Bayesian structures can be found. The objective of this thesis is the development and analysis of TS-based schemes for more complex optimization problems, founded on Bayesian reasoning. We address several complex optimization problems where the previous state-of-art relies on a relatively myopic perspective on the problem. These includes stochastic searching on the line, the Goore game, the knapsack problem, travel time estimation, and equipartitioning. Instead of employing Bayesian reasoning to obtain a solution, they rely on carefully engineered rules. In all brevity, we recast each of these optimization problems in a Bayesian framework, introducing dedicated TS based solution schemes. For all of the addressed problems, the results show that besides being more effective, the TS based approaches we introduce are also capable of solving more adverse versions of the problems, such as dealing with stochastic liars.publishedVersio

    Thirty Years of Machine Learning: The Road to Pareto-Optimal Wireless Networks

    Full text link
    Future wireless networks have a substantial potential in terms of supporting a broad range of complex compelling applications both in military and civilian fields, where the users are able to enjoy high-rate, low-latency, low-cost and reliable information services. Achieving this ambitious goal requires new radio techniques for adaptive learning and intelligent decision making because of the complex heterogeneous nature of the network structures and wireless services. Machine learning (ML) algorithms have great success in supporting big data analytics, efficient parameter estimation and interactive decision making. Hence, in this article, we review the thirty-year history of ML by elaborating on supervised learning, unsupervised learning, reinforcement learning and deep learning. Furthermore, we investigate their employment in the compelling applications of wireless networks, including heterogeneous networks (HetNets), cognitive radios (CR), Internet of things (IoT), machine to machine networks (M2M), and so on. This article aims for assisting the readers in clarifying the motivation and methodology of the various ML algorithms, so as to invoke them for hitherto unexplored services as well as scenarios of future wireless networks.Comment: 46 pages, 22 fig
    corecore