Aggregators have emerged as crucial tools for the coordination of
distributed, controllable loads. To be used effectively, an aggregator must be
able to communicate the available flexibility of the loads they control, as
known as the aggregate flexibility to a system operator. However, most of
existing aggregate flexibility measures often are slow-timescale estimations
and much less attention has been paid to real-time coordination between an
aggregator and an operator. In this paper, we consider solving an online
optimization in a closed-loop system and present a design of real-time
aggregate flexibility feedback, termed the maximum entropy feedback (MEF). In
addition to deriving analytic properties of the MEF, combining learning and
control, we show that it can be approximated using reinforcement learning and
used as a penalty term in a novel control algorithm -- the penalized predictive
control (PPC), which modifies vanilla model predictive control (MPC). The
benefits of our scheme are (1). Efficient Communication. An operator running
PPC does not need to know the exact states and constraints of the loads, but
only the MEF. (2). Fast Computation. The PPC often has much less number of
variables than an MPC formulation. (3). Lower Costs. We show that under certain
regularity assumptions, the PPC is optimal. We illustrate the efficacy of the
PPC using a dataset from an adaptive electric vehicle charging network and show
that PPC outperforms classical MPC.Comment: 13 pages, 5 figures, extension of arXiv:2006.1381