623 research outputs found
Mean-Variance Optimization in Markov Decision Processes
We consider finite horizon Markov decision processes under performance
measures that involve both the mean and the variance of the cumulative reward.
We show that either randomized or history-based policies can improve
performance. We prove that the complexity of computing a policy that maximizes
the mean reward under a variance constraint is NP-hard for some cases, and
strongly NP-hard for others. We finally offer pseudopolynomial exact and
approximation algorithms.Comment: A full version of an ICML 2011 pape
From Bandits to Experts: On the Value of Side-Observations
We consider an adversarial online learning setting where a decision maker can
choose an action in every stage of the game. In addition to observing the
reward of the chosen action, the decision maker gets side observations on the
reward he would have obtained had he chosen some of the other actions. The
observation structure is encoded as a graph, where node i is linked to node j
if sampling i provides information on the reward of j. This setting naturally
interpolates between the well-known "experts" setting, where the decision maker
can view all rewards, and the multi-armed bandits setting, where the decision
maker can only view the reward of the chosen action. We develop practical
algorithms with provable regret guarantees, which depend on non-trivial
graph-theoretic properties of the information feedback structure. We also
provide partially-matching lower bounds.Comment: Presented at the NIPS 2011 conferenc
Reinforcement Learning for the Unit Commitment Problem
In this work we solve the day-ahead unit commitment (UC) problem, by
formulating it as a Markov decision process (MDP) and finding a low-cost policy
for generation scheduling. We present two reinforcement learning algorithms,
and devise a third one. We compare our results to previous work that uses
simulated annealing (SA), and show a 27% improvement in operation costs, with
running time of 2.5 minutes (compared to 2.5 hours of existing
state-of-the-art).Comment: Accepted and presented in IEEE PES PowerTech, Eindhoven 2015, paper
ID 46273
- …