122 research outputs found
Online Convex Optimization with Binary Constraints
We consider online optimization with binary decision variables and convex
loss functions. We design a new algorithm, binary online gradient descent
(bOGD) and bound its expected dynamic regret. We provide a regret bound that
holds for any time horizon and a specialized bound for finite time horizons.
First, we present the regret as the sum of the relaxed, continuous round
optimum tracking error and the rounding error of our update in which the former
asymptomatically decreases with time under certain conditions. Then, we derive
a finite-time bound that is sublinear in time and linear in the cumulative
variation of the relaxed, continuous round optima. We apply bOGD to demand
response with thermostatically controlled loads, in which binary constraints
model discrete on/off settings. We also model uncertainty and varying load
availability, which depend on temperature deadbands, lockout of cooling units
and manual overrides. We test the performance of bOGD in several simulations
based on demand response. The simulations corroborate that the use of
randomization in bOGD does not significantly degrade performance while making
the problem more tractable
Approximate Multi-Agent Fitted Q Iteration
We formulate an efficient approximation for multi-agent batch reinforcement
learning, the approximate multi-agent fitted Q iteration (AMAFQI). We present a
detailed derivation of our approach. We propose an iterative policy search and
show that it yields a greedy policy with respect to multiple approximations of
the centralized, standard Q-function. In each iteration and policy evaluation,
AMAFQI requires a number of computations that scales linearly with the number
of agents whereas the analogous number of computations increase exponentially
for the fitted Q iteration (FQI), one of the most commonly used approaches in
batch reinforcement learning. This property of AMAFQI is fundamental for the
design of a tractable multi-agent approach. We evaluate the performance of
AMAFQI and compare it to FQI in numerical simulations. Numerical examples
illustrate the significant computation time reduction when using AMAFQI instead
of FQI in multi-agent problems and corroborate the similar decision-making
performance of both approaches
Dynamic and Distributed Online Convex Optimization for Demand Response of Commercial Buildings
We extend the regret analysis of the online distributed weighted dual
averaging (DWDA) algorithm [1] to the dynamic setting and provide the tightest
dynamic regret bound known to date with respect to the time horizon for a
distributed online convex optimization (OCO) algorithm. Our bound is linear in
the cumulative difference between consecutive optima and does not depend
explicitly on the time horizon. We use dynamic-online DWDA (D-ODWDA) and
formulate a performance-guaranteed distributed online demand response approach
for heating, ventilation, and air-conditioning (HVAC) systems of commercial
buildings. We show the performance of our approach for fast timescale demand
response in numerical simulations and obtain demand response decisions that
closely reproduce the centralized optimal ones
- …