49 research outputs found

    Rollout algorithms for stochastic scheduling problems

    Get PDF
    Title from caption. "April 1998."Includes bibliographical references (p. 26).Supported in part by the Air Force Office of Scientific Research F49620-97-C-0013by Dimitri P. Bertsekas and David A. Castañon

    Job Shop Production Planning under Uncertainty: A Monte Carlo Rollout Approach

    Get PDF
    The Monte Carlo Rollout method (MCR) is a novel approach to solve combinatorial optimization problems with uncertainties approximatively. It combines ideas from Rollout algorithms for combinatorial optimization and the Monte Carlo Tree Search in game theory. In this paper the results of an investigation of applying the MCR to a Scheduling Problem are shown. The quality of the MCR method depends on the model parameters, search depth and search width, which are strong linked to process parameters. These dependencies are analyzed by different simulations. The paper also deals with the question whether the Lookahead Pathology occurs.

    On a novel optimisation model and solution method for tactical railway maintenance planning

    Get PDF
    Within the ACEM-Rail project of the European Seventh Framework Programme new measurement and inspection techniques for monitoring the track condition are developed. By means of these new techniques the prediction of future track condition will be highly improved. To our knowledge mid-term maintenance planning is done for projects and preventive tasks, but predictions of the track condition are not incorporated into the planning process up to now. To efficiently utilise this new kind of information one task within the ACEM-Rail project is the development of methods for planning predictive maintenance tasks along with preventive and corrective ones in a mid-term planning horizon. The scope of the mid-term or tactical maintenance planning is the selection and combination of tasks and the allocation of tasks to time intervals where they will be executed. Thereby a coarse maintenance plan is determined that defines which tasks are combined together to form greater tasks as well as the time intervals for executing the selected tasks. This tactical plan serves as the base for booking future track possessions and for scheduling the maintenance tasks in detail. In this paper an algorithmic approach is presented which is able to react on dynamic and uncertain changes due to any track prediction updating. To this end optimisation algorithms are implemented within a rolling planning process, so it is possible to respond to updated information on track condition by adapting the tactical plan. A novel optimisation method is developed to generate cost effective and robust solutions by looking ahead into the future and evaluating different solutions in several scenario

    Parallel Nested Monte-Carlo Search

    Get PDF
    International audienceWe address the parallelization of a Monte-Carlo search algorithm. On a cluster of 64 cores we obtain a speedup of 56 for the parallelization of Morpion Solitaire. An algorithm that behaves better than a naive one on heterogeneous clusters is also detailed

    Surveillance Versus Reconnaissance: An Entropy Based Model

    Get PDF
    With the advancing capabilities of Intelligence, Surveillance, and Reconnaissance (ISR) assets and sensors, effective utilization of these resources continues to pose a challenge to military decision makers. The methodology developed explores allocation of ISR assets while balancing detection of new targets versus surveillance of already detected targets using entropy as a Measure of Effectiveness (MOE). Scenarios with an unknown number of static and moving targets in a bounded geographical region are considered. A baseline model was built to examine four different search algorithms: random, raster, greedy, and a rollout algorithm based on dynamic programming. A space-filling Nearly Orthogonal Latin Hypercube experimental design was applied to generate data to examine four MOEs: step entropy, average entropy, number of targets found, and time steps to completion. Based on statistical analysis and time series plots, the rollout algorithm\u27s performance dominated others algorithms considered. In addition to minimizing uncertainty in the first 100 time steps of the run, the rollout algorithm also produced the highest number of targets found within the fixed time step scenario, and, for the exhaustive target detection scenario, discovered all of the targets within the region in less time steps. Based on these results, the rollout algorithm provides superior performance in the allocation of ISR assets while balancing detection of new targets versus surveillance of already detected targets

    Solving Markov decision processes for network-level post-hazard recovery via simulation optimization and rollout

    Full text link
    Computation of optimal recovery decisions for community resilience assurance post-hazard is a combinatorial decision-making problem under uncertainty. It involves solving a large-scale optimization problem, which is significantly aggravated by the introduction of uncertainty. In this paper, we draw upon established tools from multiple research communities to provide an effective solution to this challenging problem. We provide a stochastic model of damage to the water network (WN) within a testbed community following a severe earthquake and compute near-optimal recovery actions for restoration of the water network. We formulate this stochastic decision-making problem as a Markov Decision Process (MDP), and solve it using a popular class of heuristic algorithms known as rollout. A simulation-based representation of MDPs is utilized in conjunction with rollout and the Optimal Computing Budget Allocation (OCBA) algorithm to address the resulting stochastic simulation optimization problem. Our method employs non-myopic planning with efficient use of simulation budget. We show, through simulation results, that rollout fused with OCBA performs competitively with respect to rollout with total equal allocation (TEA) at a meagre simulation budget of 5-10% of rollout with TEA, which is a crucial step towards addressing large-scale community recovery problems following natural disasters.Comment: Submitted to Simulation Optimization for Cyber Physical Energy Systems (Special Session) in 14th IEEE International Conference on Automation Science and Engineerin

    Guidance of Autonomous Amphibious Vehicles for Flood Rescue Support

    Get PDF
    We develop a path-planning algorithm to guide autonomous amphibious vehicles (AAVs) for flood rescue support missions. Specifically, we develop an algorithm to control multiple AAVs to reach/rescue multiple victims (also called targets) in a flood scenario in 2D, where the flood water flows across the scene and the targets move (drifted by the flood water) along the flood stream. A target is said to be rescued if an AAV lies within a circular region of a certain radius around the target. The goal is to control the AAVs such that each target gets rescued while optimizing a certain performance objective. The algorithm design is based on the theory of partially observable Markov decision process (POMDP). In practice, POMDP problems are hard to solve exactly, so we use an approximation method called nominal belief-state optimization (NBO). We compare the performance of the NBO approach with a greedy approach

    Deep Reinforcement Learning for Approximate Policy Iteration: Convergence Analysis and a Post-Earthquake Disaster Response Case Study

    Get PDF
    Approximate Policy Iteration (API) is a Class of Reinforcement Learning (RL) Algorithms that Seek to Solve the Long-Run Discounted Reward Markov Decision Process (MDP), Via the Policy Iteration Paradigm, Without Learning the Transition Model in the Underlying Bellman Equation. Unfortunately, These Algorithms Suffer from a Defect Known as Chattering in Which the Solution (Policy) Delivered in Each Iteration of the Algorithm Oscillates between Improved and Worsened Policies, Leading to Sub-Optimal Behavior. Two Causes for This that Have Been Traced to the Crucial Policy Improvement Step Are: (I) the Inaccuracies in the Policy Improvement Function and (Ii) the Exploration/exploitation Tradeoff Integral to This Step, Which Generates Variability in Performance. Both of These Defects Are Amplified by Simulation Noise. Deep RL Belongs to a Newer Class of Algorithms in Which the Resolution of the Learning Process is Refined Via Mechanisms Such as Experience Replay And/or Deep Neural Networks for Improved Performance. in This Paper, a New Deep Learning Approach is Developed for API Which Employs a More Accurate Policy Improvement Function, Via an Enhanced Resolution Bellman Equation, Thereby Reducing Chattering and Eliminating the Need for Exploration in the Policy Improvement Step. Versions of the New Algorithm for Both the Long-Run Discounted MDP and Semi-MDP Are Presented. Convergence Properties of the New Algorithm Are Studied Mathematically, and a Post-Earthquake Disaster Response Case Study is Employed to Demonstrate Numerically the Algorithm\u27s Efficacy
    corecore