5 research outputs found

    Controlled Use of Subgoals in Reinforcement Learning

    Get PDF

    Planning under risk and uncertainty

    Get PDF
    This thesis concentrates on the optimization of large-scale management policies under conditions of risk and uncertainty. In paper I, we address the problem of solving large-scale spatial and temporal natural resource management problems. To model these types of problems, the framework of graph-based Markov decision processes (GMDPs) can be used. Two algorithms for computation of high-quality management policies are presented: the first is based on approximate linear programming (ALP) and the second is based on mean-field approximation and approximate policy iteration (MF-API). The applicability and efficiency of the algorithms were demonstrated by their ability to compute near-optimal management policies for two large-scale management problems. It was concluded that the two algorithms compute policies of similar quality. However, the MF-API algorithm should be used when both the policy and the expected value of the computed policy are required, while the ALP algorithm may be preferred when only the policy is required. In paper II, a number of reinforcement learning algorithms are presented that can be used to compute management policies for GMDPs when the transition function can only be simulated because its explicit formulation is unknown. Studies of the efficiency of the algorithms for three management problems led us to conclude that some of these algorithms were able to compute near-optimal management policies. In paper III, we used the GMDP framework to optimize long-term forestry management policies under stochastic wind-damage events. The model was demonstrated by a case study of an estate consisting of 1,200 ha of forest land, divided into 623 stands. We concluded that managing the estate according to the risk of wind damage increased the expected net present value (NPV) of the whole estate only slightly, less than 2%, under different wind-risk assumptions. Most of the stands were managed in the same manner as when the risk of wind damage was not considered. However, the analysis rests on properties of the model that need to be refined before definite conclusions can be drawn
    corecore