70 research outputs found
Multistage decisions and risk in Markov decision processes: towards effective approximate dynamic programming architectures
The scientific domain of this thesis is optimization under uncertainty for discrete event stochastic systems. In particular, this thesis focuses on the practical implementation of the Dynamic Programming (DP) methodology to discrete event stochastic systems. Unfortunately DP in its crude form suffers from three severe computational obstacles that make its imple-mentation to such systems an impossible task. This thesis addresses these obstacles by developing and executing practical Approximate Dynamic Programming (ADP) techniques.
Specifically, for the purposes of this thesis we developed the following ADP techniques. The first one is inspired from the Reinforcement Learning (RL) literature and is termed as Real Time Approximate Dynamic Programming (RTADP). The RTADP algorithm is meant for active learning while operating the stochastic system. The basic idea is that the agent while constantly interacts with the uncertain environment accumulates experience, which enables him to react more optimal in future similar situations. While the second one is an off-line ADP procedure
These ADP techniques are demonstrated on a variety of discrete event stochastic systems such as: i) a three stage queuing manufacturing network with recycle, ii) a supply chain of the light aromatics of a typical refinery, iii) several stochastic shortest path instances with a single starting and terminal state and iv) a general project portfolio management problem.
Moreover, this work addresses, in a systematic way, the issue of multistage risk within the DP framework by exploring the usage of intra-period and inter-period risk sensitive utility functions. In this thesis we propose a special structure for an intra-period utility and compare the derived policies in several multistage instances.Ph.D.Committee Chair: Jay H. Lee; Committee Member: Martha Grover; Committee Member: Matthew J. Realff; Committee Member: Shabbir Ahmed; Committee Member: Stylianos Kavadia
Reinforcement Learning and Tree Search Methods for the Unit Commitment Problem
The unit commitment (UC) problem, which determines operating schedules of
generation units to meet demand, is a fundamental task in power systems
operation. Existing UC methods using mixed-integer programming are not
well-suited to highly stochastic systems. Approaches which more rigorously
account for uncertainty could yield large reductions in operating costs by
reducing spinning reserve requirements; operating power stations at higher
efficiencies; and integrating greater volumes of variable renewables. A
promising approach to solving the UC problem is reinforcement learning (RL), a
methodology for optimal decision-making which has been used to conquer
long-standing grand challenges in artificial intelligence. This thesis explores
the application of RL to the UC problem and addresses challenges including
robustness under uncertainty; generalisability across multiple problem
instances; and scaling to larger power systems than previously studied. To
tackle these issues, we develop guided tree search, a novel methodology
combining model-free RL and model-based planning. The UC problem is formalised
as a Markov decision process and we develop an open-source environment based on
real data from Great Britain's power system to train RL agents. In problems of
up to 100 generators, guided tree search is shown to be competitive with
deterministic UC methods, reducing operating costs by up to 1.4\%. An advantage
of RL is that the framework can be easily extended to incorporate
considerations important to power systems operators such as robustness to
generator failure, wind curtailment or carbon prices. When generator outages
are considered, guided tree search saves over 2\% in operating costs as
compared with methods using conventional reserve criteria
Formal Methods for Autonomous Systems
Formal methods refer to rigorous, mathematical approaches to system
development and have played a key role in establishing the correctness of
safety-critical systems. The main building blocks of formal methods are models
and specifications, which are analogous to behaviors and requirements in system
design and give us the means to verify and synthesize system behaviors with
formal guarantees.
This monograph provides a survey of the current state of the art on
applications of formal methods in the autonomous systems domain. We consider
correct-by-construction synthesis under various formulations, including closed
systems, reactive, and probabilistic settings. Beyond synthesizing systems in
known environments, we address the concept of uncertainty and bound the
behavior of systems that employ learning using formal methods. Further, we
examine the synthesis of systems with monitoring, a mitigation technique for
ensuring that once a system deviates from expected behavior, it knows a way of
returning to normalcy. We also show how to overcome some limitations of formal
methods themselves with learning. We conclude with future directions for formal
methods in reinforcement learning, uncertainty, privacy, explainability of
formal methods, and regulation and certification
Towards Optimal Real-Time Automotive Emission Control
The legal bounds on both toxic and carbon dioxide emissions from automotive vehicles are continuously being lowered, forcing manufacturers to rely on increasingly advanced methods to reduce emissions and improve fuel efficiency. Though great strides have been made to date, there is still a large potential for continued improvement. Today, many subsystems in vehicles are optimized for static operation, where subsystems in the vehicle perform well at constant operating points. Extending optimal operation to the dynamic case through the use of optimal control is one method for further improvements.This thesis focuses on two subtopics that are crucial for implementing optimal control; dynamic modeling of vehicle subsystems, and methods for generating and evaluating computationally efficient optimal controllers. Though today\u27s vehicles are outfitted with increasingly powerful computers, their computational performance is low compared to a conventional PC. Any controller must therefore be very computationally efficient in order to feasibly be implemented. Furthermore, a sufficiently accurate dynamic model of the subsystem is needed in order to determine the optimal control value. Though many dynamic models of the vehicle\u27s subsystems exist, most do not fulfill the specific requirements set by optimal controllers.This thesis comprises five papers that, together, probe some methods of implementing dynamic optimal control in real-time. Two papers develop optimal control methods, one introduces and studies a cold-start model of the three-way catalyst, one paper extends the three-way catalyst model and studies optimal cold-start control, and one considers fuel-optimally controlling the speed of the engine in a series-hybrid. By combining the method and model papers we open for the potential to reduce toxic emissions by better managing cold-starts in hybrid vehicles, as well as reducing carbon dioxide emissions by operating the engine in a more efficient manner during transients
- …