11 research outputs found

    Paradoxes in Learning and the Marginal Value of Information

    Full text link

    Selecting Computations: Theory and Applications

    Full text link
    Sequential decision problems are often approximately solvable by simulating possible future action sequences. {\em Metalevel} decision procedures have been developed for selecting {\em which} action sequences to simulate, based on estimating the expected improvement in decision quality that would result from any particular simulation; an example is the recent work on using bandit algorithms to control Monte Carlo tree search in the game of Go. In this paper we develop a theoretical basis for metalevel decisions in the statistical framework of Bayesian {\em selection problems}, arguing (as others have done) that this is more appropriate than the bandit framework. We derive a number of basic results applicable to Monte Carlo selection problems, including the first finite sampling bounds for optimal policies in certain cases; we also provide a simple counterexample to the intuitive conjecture that an optimal policy will necessarily reach a decision in all cases. We then derive heuristic approximations in both Bayesian and distribution-free settings and demonstrate their superiority to bandit-based heuristics in one-shot decision problems and in Go.Comment: 10 pages, UAI 201

    Valuing Information in Complex Systems: An Integrated Analytical Approach to Achieve Optimal Performance in the Beer Distribution Game

    Get PDF
    Even seemingly simple systems can produce complex dynamics, which leads management professionals to develop tools for training, monitoring, and improving performance. Management simulators provide useful insights about human behavior and interactions, while computational and informational decision support tools offer opportunities to reduce inconsistencies, errors, and non-optimal human choices, particularly for complex systems that involve multiple decision makers, uncertainty, variability, and time. We use the context of a popular management simulator that teaches students about the bullwhip effect (i.e., the beer distribution game) to explore an integrated decision analytic, control theory, and system dynamics approach to the game that recognizes the value of available (imperfect) information and considers the value of perfect information to provide the optimal strategy. Using a discrete event simulation, we characterize optimal decisions and overall team scores for the situation of actual available information and perfect information. We describe our implementation of the strategy in the field to win the 2007 beer game world championship played at the 25th conference of the International System Dynamics Society. This paper seeks to demonstrate that better understanding of the system and use of available information leads to significantly lower expected costs than identified in prior studies. Understanding complex systems and using information optimally may increase system stability and significantly improve performance, in some cases even without better information than already available

    Top-k selection with pairwise comparisons

    Get PDF
    In this work we consider active, pairwise top- selection, the problem of identifying the highest quality subset of given size from a set of alternatives, based on the information collected from noisy, sequentially chosen pairwise comparisons. We adapt two well known Bayesian sequential sampling techniques, the Knowledge Gradient policy and the Optimal Computing Budget Allocation framework for the pairwise setting and compare their performance on a range of empirical tests. We demonstrate that these methods are able to match or outperform the current state of the art racing algorithm approach

    Active Sensing for Partially Observable Markov Decision Processes

    Get PDF
    Context information on a smart phone can be used to tailor applications for specific situations (e.g. provide tailored routing advice based on location, gas prices and traffic). However, typical context-aware smart phone applications use very limited context information such as user identity, location and time. In the future, smart phones will need to decide from a wide range of sensors to gather information from in order to best accommodate user needs and preferences in a given context. In this thesis, we present a model for active sensor selection within decision-making processes, in which observational features are selected based on longer-term impact on the decisions made by the smart phone. This thesis formulates the problem as a partially observable Markov decision process (POMDP), and proposes a non-myopic solution to the problem using a state of the art approximate planning algorithm Symbolic Perseus. We have tested our method on a 3 small example domains, comparing different policy types, discount factors and cost settings. The experimental results proved that the proposed approach delivers a better policy in the situation of costly sensors, while at the same time provides the advantage of faster policy computation with less memory usage

    Models for Retail Inventory Management with Demand Learning

    Get PDF
    Matching supply with demand is key to success in the volatile and competitive retail business. To this end, retailers seek to improve their inventory decisions by learning demand from various sources. More interestingly, retailers' inventory decisions may in turn obscure the demand information they observe. This dissertation examines three problems in retail contexts that involve interactions between inventory management and demand learning. First, motivated by the unprecedented adverse impact of the 2008 financial crisis on retailers, we consider the inventory control problem of a firm experiencing potential demand shifts whose timings are known but whose impacts are not known. We establish structural results about the optimal policies, construct novel cost lower bounds based on particular information relaxations, and propose near-optimal heuristic policies derived from those bounds. We then consider the optimal allocation of a limited inventory for fashion retailers to conduct "merchandise tests" prior to the main selling season as a demand learning approach. We identity a key tradeoff between the quantity and quality of demand observations. We also find that the visibility into the timing of each sales transaction has a pivotal impact on the optimal allocation decisions and the value of merchandise tests. Finally, we consider a retailer selling an experiential product to consumers who learn product quality from reviews generated by previous buyers. The retailer maximizes profit by choosing whether to offer the product for sale to each arriving customer. We characterize the optimal product offering policies to be of threshold type. Interestingly, we find that it can be optimal for the firm to withhold inventory and not to offer the product even if an arriving customer is willing to buy for sure. We numerically demonstrate that personalized offering is most valuable when the price is high and customers are optimistic but uncertain about product quality.Doctor of Philosoph
    corecore