1,100 research outputs found

    Reinforcement Learning for the Unit Commitment Problem

    Full text link
    In this work we solve the day-ahead unit commitment (UC) problem, by formulating it as a Markov decision process (MDP) and finding a low-cost policy for generation scheduling. We present two reinforcement learning algorithms, and devise a third one. We compare our results to previous work that uses simulated annealing (SA), and show a 27% improvement in operation costs, with running time of 2.5 minutes (compared to 2.5 hours of existing state-of-the-art).Comment: Accepted and presented in IEEE PES PowerTech, Eindhoven 2015, paper ID 46273

    Parameterized MDPs and Reinforcement Learning Problems -- A Maximum Entropy Principle Based Framework

    Full text link
    We present a framework to address a class of sequential decision making problems. Our framework features learning the optimal control policy with robustness to noisy data, determining the unknown state and action parameters, and performing sensitivity analysis with respect to problem parameters. We consider two broad categories of sequential decision making problems modelled as infinite horizon Markov Decision Processes (MDPs) with (and without) an absorbing state. The central idea underlying our framework is to quantify exploration in terms of the Shannon Entropy of the trajectories under the MDP and determine the stochastic policy that maximizes it while guaranteeing a low value of the expected cost along a trajectory. This resulting policy enhances the quality of exploration early on in the learning process, and consequently allows faster convergence rates and robust solutions even in the presence of noisy data as demonstrated in our comparisons to popular algorithms such as Q-learning, Double Q-learning and entropy regularized Soft Q-learning. The framework extends to the class of parameterized MDP and RL problems, where states and actions are parameter dependent, and the objective is to determine the optimal parameters along with the corresponding optimal policy. Here, the associated cost function can possibly be non-convex with multiple poor local minima. Simulation results applied to a 5G small cell network problem demonstrate successful determination of communication routes and the small cell locations. We also obtain sensitivity measures to problem parameters and robustness to noisy environment data.Comment: 17 pages, 7 figure

    Dynamic vehicle routing problems: Three decades and counting

    Get PDF
    Since the late 70s, much research activity has taken place on the class of dynamic vehicle routing problems (DVRP), with the time period after year 2000 witnessing a real explosion in related papers. Our paper sheds more light into work in this area over more than 3 decades by developing a taxonomy of DVRP papers according to 11 criteria. These are (1) type of problem, (2) logistical context, (3) transportation mode, (4) objective function, (5) fleet size, (6) time constraints, (7) vehicle capacity constraints, (8) the ability to reject customers, (9) the nature of the dynamic element, (10) the nature of the stochasticity (if any), and (11) the solution method. We comment on technological vis-à-vis methodological advances for this class of problems and suggest directions for further research. The latter include alternative objective functions, vehicle speed as decision variable, more explicit linkages of methodology to technological advances and analysis of worst case or average case performance of heuristics.© 2015 Wiley Periodicals, Inc
    corecore