7 research outputs found

    Sample Efficient Bayesian Reinforcement Learning

    Get PDF
    Artificial Intelligence (AI) has been an active field of research for over a century now. The research field of AI may be grouped into various tasks that are expected from an intelligent agent; two major ones being learning & inference and planning. The act of storing new knowledge is known as learning while inference refers to the act to extracting conclusions given agent’s limited knowledge base. They are tightly knit by the design of its knowledge base. The process of deciding long-term actions or plans given its current knowledge is called planning.Reinforcement Learning (RL) brings together these two tasks by posing a seemingly benign question “How to act optimally in an unknown environment?”. This requires the agent to learn about its environment as well as plan actions given its current knowledge about it. In RL, the environment can be represented by a mathematical model and we associate an intrinsic value to the actions that the agent may choose.In this thesis, we present a novel Bayesian algorithm for the problem of RL. Bayesian RL is a widely explored area of research but is constrained by scalability and performance issues. We provide first steps towards rigorous analysis of these types of algorithms. Bayesian algorithms are characterized by the belief that they maintain over their unknowns; which is updated based on the collected evidence. This is different from the traditional approach in RL in terms of problem formulation and formal guarantees. Our novel algorithm combines aspects of planning and learning due to its inherent Bayesian formulation. It does so in a more scalable fashion, with formal PAC guarantees. We also give insights on the application of Bayesian framework for the estimation of model and value, in a joint work on Bayesian backward induction for RL

    Learning by Investing: Evidence from Venture Capital

    Get PDF
    To understand the investment behavior of venture capital (VC) investors, this paper estimates a dynamic model of learning. Behavior reflecting both learning from past investments (exploitation) and anticipated future learning (exploration) are found to be prevalent, and the model's additional predictions about success rates and investment speeds are confirmed empirically. Learning is important, since it can create informational frictions, and it has potential implications for VCs' investments and organizations. VCs are found to internalize the value of learning, and this may help promote exploration beyond the levels sustained in standard capital markets, which is socially valuable.Venture capital; Learning; Multi-armed bandit model

    Local Bandit Approximation for Optimal Learning Problems

    No full text
    In general, procedures for determining Bayes-optimal adaptive controls for Markov decision processes (MDP's) require a prohibitive amount of computation---the optimal learning problem is intractable. This paper proposes an approximate approach in which bandit processes are used to model, in a certain "local" sense, a given MDP. Bandit processes constitute an important subclass of MDP's, and have optimal learning strategies (defined in terms of Gittins indices) that can be computed relatively efficiently. Thus, one scheme for achieving approximately-optimal learning for general MDP's proceeds by taking actions suggested by strategies that are optimal with respect to local bandit models. 1 INTRODUCTION Watkins [1989] has defined optimal learning as: "... the process of collecting and using information during learning in an optimal manner, so that the learner makes the best possible decisions at all stages of learning: learning itself is regarded as a multistage decision process, and lea..
    corecore