3 research outputs found

    Optimistic planning for the stochastic knapsack problem

    Get PDF
    The stochastic knapsack problem is a stochastic resource allocation problem that arises frequently and yet is exceptionally hard to solve. We derive and study an optimistic planning algorithm specifically designed for the stochastic knapsack problem. Unlike other optimistic planning algorithms for MDPs, our algorithm, OpStoK, avoids the use of discounting and is adaptive to the amount of resources available. We achieve this behavior by means of a concentration inequality that simultaneously applies to capacity and reward estimates. Crucially, we are able to guarantee that the aforementioned confidence regions hold collectively over all time steps by an application of Doob’s inequality. We demonstrate that the method returns an εε-optimal solution to the stochastic knapsack problem with high probability. To the best of our knowledge, our algorithm is the first which provides such guarantees for the stochastic knapsack problem. Furthermore, our algorithm is an anytime algorithm and will return a good solution even if stopped prematurely. This is particularly important given the difficulty of the problem. We also provide theoretical conditions to guarantee OpStoK does not expand all policies and demonstrate favorable performance in a simple experimental setting

    Sequential decision problems in online education

    Get PDF
    This thesis is concerned with the study of sequential decision problems motivated by the challenge of selecting questions to give to students in an online educational environment. In online education there is the potential to develop personalized and adaptive learning environments, where students can receive individualized sequences of questions which update as the student is observed to be struggling or flourishing. In order to achieve this personalization, we must learn about how good each question is, while simultaneously giving students good questions. Multi-armed bandits are a popular technique for sequential decision making under uncertainty. Due to their online nature and their ability to balance the trade-off between exploitation and exploration, multi-armed bandits lend themselves naturally to this problem of adaptively selecting questions in education software. However, due to the complexity of the educational problem, standard approaches to multi-armed bandits cannot be applied directly. In this thesis variants of the multi-armed bandit problem specifically motivated by the issues arising in the educational domain are considered. Particular focus will be placed on ton the statistical and mathematical foundations of such approaches
    corecore