2 research outputs found
The Value-of-Information in Matching with Queues
We consider the problem of \emph{optimal matching with queues} in dynamic
systems and investigate the value-of-information. In such systems, the
operators match tasks and resources stored in queues, with the objective of
maximizing the system utility of the matching reward profile, minus the average
matching cost. This problem appears in many practical systems and the main
challenges are the no-underflow constraints, and the lack of matching-reward
information and system dynamics statistics. We develop two online matching
algorithms: Learning-aided Reward optimAl Matching () and
Dual- () to effectively resolve both challenges.
Both algorithms are equipped with a learning module for estimating the
matching-reward information, while incorporates an additional
module for learning the system dynamics. We show that both algorithms achieve
an close-to-optimal utility performance for any
, while achieves a faster convergence speed and a
better delay compared to , i.e., delay and convergence under
compared to delay and convergence under
( and are maximum estimation errors for
reward and system dynamics). Our results reveal that information of different
system components can play very different roles in algorithm performance and
provide a systematic way for designing joint learning-control algorithms for
dynamic systems
Learning-aided Stochastic Network Optimization with Imperfect State Prediction
We investigate the problem of stochastic network optimization in the presence
of imperfect state prediction and non-stationarity. Based on a novel
distribution-accuracy curve prediction model, we develop the predictive
learning-aided control (PLC) algorithm, which jointly utilizes historic and
predicted network state information for decision making. PLC is an online
algorithm that requires zero a-prior system statistical information, and
consists of three key components, namely sequential distribution estimation and
change detection, dual learning, and online queue-based control.
Specifically, we show that PLC simultaneously achieves good long-term
performance, short-term queue size reduction, accurate change detection, and
fast algorithm convergence. In particular, for stationary networks, PLC
achieves a near-optimal , utility-delay
tradeoff. For non-stationary networks, \plc{} obtains an
utility-backlog tradeoff for distributions that last
time, where
is the prediction accuracy and is a constant (the
Backpressue algorithm \cite{neelynowbook} requires an length
for the same utility performance with a larger backlog). Moreover, PLC detects
distribution change slots faster with high probability ( is the
prediction size) and achieves an convergence time. Our results demonstrate
that state prediction (even imperfect) can help (i) achieve faster detection
and convergence, and (ii) obtain better utility-delay tradeoffs