Search CORE

1,100 research outputs found

Reinforcement Learning for the Unit Commitment Problem

Author: Dalal Gal
Mannor Shie
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/07/2015
Field of study

In this work we solve the day-ahead unit commitment (UC) problem, by formulating it as a Markov decision process (MDP) and finding a low-cost policy for generation scheduling. We present two reinforcement learning algorithms, and devise a third one. We compare our results to previous work that uses simulated annealing (SA), and show a 27% improvement in operation costs, with running time of 2.5 minutes (compared to 2.5 hours of existing state-of-the-art).Comment: Accepted and presented in IEEE PES PowerTech, Eindhoven 2015, paper ID 46273

arXiv.org e-Print Archive

Crossref

Recommended from our members

Metareasoning for Planning and Execution in Autonomous Systems

Author: Svegliato Justin
Publication venue: ScholarWorks@UMass Amherst
Publication date: 21/03/2022
Field of study

Metareasoning is the process by which an autonomous system optimizes, specifically monitors and controls, its own planning and execution processes in order to operate more effectively in its environment. As autonomous systems rapidly grow in sophistication and autonomy, the need for metareasoning has become critical for efficient and reliable operation in noisy, stochastic, unstructured domains for long periods of time. This is due to the uncertainty over the limitations of their reasoning capabilities and the range of their potential circumstances. However, despite considerable progress in metareasoning as a whole over the last thirty years, work on metareasoning for planning relies on several assumptions that diminish its accuracy and practical utility in autonomous systems that operate in the real world while work on metareasoning for execution has not seen much attention yet. This dissertation therefore proposes more effective metareasoning for planning while expanding the scope of metareasoning to execution to improve the efficiency of planning and the reliability of execution in autonomous systems. In particular, we offer a two-pronged framework that introduces metareasoning for efficient planning and reliable execution in autonomous systems. We begin by proposing two forms of metareasoning for efficient planning: (1) a method that determines when to interrupt an anytime algorithm and act on the current solution by using online performance prediction and (2) a method that tunes the hyperparameters of the anytime algorithm at runtime by using deep reinforcement learning. We then propose two forms of metareasoning for reliable execution: (3) a method that recovers from exceptions that can be encountered during operation by using belief space planning and (4) a method that maintains and restores safety during operation by using probabilistic planning

ScholarWorks@UMass Amherst

Parameterized MDPs and Reinforcement Learning Problems -- A Maximum Entropy Principle Based Framework

Author: Salapaka Srinivasa M
Srivastava Amber
Publication venue
Publication date: 10/01/2021
Field of study

We present a framework to address a class of sequential decision making problems. Our framework features learning the optimal control policy with robustness to noisy data, determining the unknown state and action parameters, and performing sensitivity analysis with respect to problem parameters. We consider two broad categories of sequential decision making problems modelled as infinite horizon Markov Decision Processes (MDPs) with (and without) an absorbing state. The central idea underlying our framework is to quantify exploration in terms of the Shannon Entropy of the trajectories under the MDP and determine the stochastic policy that maximizes it while guaranteeing a low value of the expected cost along a trajectory. This resulting policy enhances the quality of exploration early on in the learning process, and consequently allows faster convergence rates and robust solutions even in the presence of noisy data as demonstrated in our comparisons to popular algorithms such as Q-learning, Double Q-learning and entropy regularized Soft Q-learning. The framework extends to the class of parameterized MDP and RL problems, where states and actions are parameter dependent, and the objective is to determine the optimal parameters along with the corresponding optimal policy. Here, the associated cost function can possibly be non-convex with multiple poor local minima. Simulation results applied to a 5G small cell network problem demonstrate successful determination of communication routes and the small cell locations. We also obtain sensitivity measures to problem parameters and robustness to noisy environment data.Comment: 17 pages, 7 figure

arXiv.org e-Print Archive

Repository for Publications and Research Data

Dynamic vehicle routing problems: Three decades and counting

Author: Adler
Adulyassak
Agra
Attanasio
Attanasio
Azi
Barcelo
Barkaoui
Beaudry
Bektaş
Bektaş
Bektaş
Bent
Berbeglia
Berbeglia
Berbeglia
Bertsekas
Bertsimas
Bertsimas
Bixby
Bopardikar
Branchini
Branke
Brodtkorb
Brotcorne
Campbell
Chen
Chen
Chen
Cheung
Christiansen
Coelho
Cordeau
Crainic
Créput
Dabia
Dan
Dial
Du
Elhassania
Erera
Fabri
Fagerholt
Ferrucci
Fiegl
Fink
Flatberg
Fleischmann
Gan
Geiger
Gendreau
Gendreau
Gendreau
Gendreau
Gendreau
Ghannadpour
Ghiani
Ghiani
Ghiani
Godfrey
Goel
Goel
Gomes
Goodson
Gounaris
Gounaris
Haghani
Hanshar
Held
Hong
Hu
Huang
Hvattum
Hvattum
Ichoua
Ichoua
Jaillet
Jaillet
Jaillet
Jaillet
Jaillet
Jaillet
Karp
Kergosien
Khouadjia
Khouadjia
Koch
Larsen
Larsen
Larsen
Larsen
Li
Li
Li
Lin
Lorini
Magirou
Malandraki
Mavrovouniotis
Meesuptaweekoon
Mendoza
Mendoza
Mes
Mitrović-Minić
Mitrović-Minić
Montemanni
Moore
Mu
Ninikas
Novoa
Pillac
Potvin
Psaraftis
Psaraftis
Psaraftis
Psaraftis
Psaraftis
Pureza
Rezaei-Malek
Savla
Schilde
Schulz
Secomandi
Secomandi
Secomandi
Sheridan
Slater
Sleator
Smith
Tagmouti
Taillard
Takagi
Talbi
Taniguchi
Taş
Taş
Thomas
Thomas
Toriello
Verma
Wen
Wohlgemuth
Xiang
Xu
Yan
Yang
Yang
Yu
Zargayouna
Zhang
Zhao
Publication venue: 'Wiley'
Publication date: 01/01/2016
Field of study

Since the late 70s, much research activity has taken place on the class of dynamic vehicle routing problems (DVRP), with the time period after year 2000 witnessing a real explosion in related papers. Our paper sheds more light into work in this area over more than 3 decades by developing a taxonomy of DVRP papers according to 11 criteria. These are (1) type of problem, (2) logistical context, (3) transportation mode, (4) objective function, (5) fleet size, (6) time constraints, (7) vehicle capacity constraints, (8) the ability to reject customers, (9) the nature of the dynamic element, (10) the nature of the stochasticity (if any), and (11) the solution method. We comment on technological vis-à-vis methodological advances for this class of problems and suggest directions for further research. The latter include alternative objective functions, vehicle speed as decision variable, more explicit linkages of methodology to technological advances and analysis of worst case or average case performance of heuristics.© 2015 Wiley Periodicals, Inc

LJMU Research Online (Liverpool John Moores University)

Crossref

Online Research Database In Technology

The development of data-driven methods for modelling and optimisation of chemical process systems

Author: Mowbray Max
Publication venue
Publication date: 01/08/2023
Field of study

The University of Manchester - Institutional Repository