1,292 research outputs found
Optimal Policies for the Management of a Plug-In Hybrid Electric Vehicle Swap Station
Optimizing operations at plug-in hybrid electric vehicle (PHEV) battery swap stations is internally motivated by the movement to make transportation cleaner and more efficient. A PHEV swap station allows PHEV owners to quickly exchange their depleted PHEV battery for a fully charged battery. The PHEV-Swap Station Management Problem (PHEV-SSMP) is introduced, which models battery charging and discharging operations at a PHEV swap station facing nonstationary, stochastic demand for battery swaps, nonstationary prices for charging depleted batteries, and nonstationary prices for discharging fully charged batteries. Discharging through vehicle-to-grid is beneficial for aiding power load balancing. The objective of the PHEV-SSMP is to determine the optimal policy for charging and discharging batteries that maximizes expected total profit over a fixed time horizon. The PHEV-SSMP is formulated as a finite-horizon, discrete-time Markov decision problem and an optimal policy is found using dynamic programming. Structural properties are derived, to include sufficiency conditions that ensure the existence of a monotone optimal policy. A computational experiment is developed using realistic demand and electricity pricing data. The optimal policy is compared to two benchmark policies which are easily implementable by PHEV swap station managers. Two designed experiments are conducted to obtain policy insights regarding the management of PHEV swap stations. These insights include the minimum battery level in relationship to PHEVs in a local area, the incentive necessary to discharge, and the viability of PHEV swap stations under many conditions
Configurable Markov Decision Processes
In many real-world problems, there is the possibility to configure, to a
limited extent, some environmental parameters to improve the performance of a
learning agent. In this paper, we propose a novel framework, Configurable
Markov Decision Processes (Conf-MDPs), to model this new type of interaction
with the environment. Furthermore, we provide a new learning algorithm, Safe
Policy-Model Iteration (SPMI), to jointly and adaptively optimize the policy
and the environment configuration. After having introduced our approach and
derived some theoretical results, we present the experimental evaluation in two
explicative problems to show the benefits of the environment configurability on
the performance of the learned policy
Tactical planning in healthcare using approximate dynamic programming
Tactical planning of resources in hospitals concerns elective patient admission planning and the intermediate term allocation of resource capacities. Its main objectives are to achieve equitable access for patients, to serve the strategically agreed number of patients, and to use resources efficiently. We propose a method to develop a tactical resource allocation and patient admission plan that takes stochastic elements into consideration, thereby providing robust plans. Our method is developed in an Approximate Dynamic Programming (ADP) framework and copes with multiple resources, multiple time periods and multiple patient groups with various uncertain treatment paths through the hospital and an uncertain number of arrivals in each time period, thereby integrating decision making for a chain of hospital resources. Computational results indicate that the ADP approach provides an accurate approximation of the value functions, and that it is suitable for large problem instances at hospitals, in which the ADP approach performs significantly better than two other heuristic approaches. Our ADP algorithm is generic, as various cost functions and basis functions can be used in various settings of tactical hospital management
- …