Search CORE

2 research outputs found

The Value-of-Information in Matching with Queues

Author: Bishop C.
Chung F.
Publication venue
Publication date: 27/03/2015
Field of study

We consider the problem of \emph{optimal matching with queues} in dynamic systems and investigate the value-of-information. In such systems, the operators match tasks and resources stored in queues, with the objective of maximizing the system utility of the matching reward profile, minus the average matching cost. This problem appears in many practical systems and the main challenges are the no-underflow constraints, and the lack of matching-reward information and system dynamics statistics. We develop two online matching algorithms: Learning-aided Reward optimAl Matching (

\mathtt{LRAM}

) and Dual-

\mathtt{LRAM}

(

\mathtt{DRAM}

) to effectively resolve both challenges. Both algorithms are equipped with a learning module for estimating the matching-reward information, while

\mathtt{DRAM}

incorporates an additional module for learning the system dynamics. We show that both algorithms achieve an

O(\epsilon+\delta_r)

close-to-optimal utility performance for any

\epsilon>0

, while

\mathtt{DRAM}

achieves a faster convergence speed and a better delay compared to

\mathtt{LRAM}

, i.e.,

O(\delta_{z}/\epsilon + \log(1/\epsilon)^2))

delay and

O(\delta_z/\epsilon)

convergence under

\mathtt{DRAM}

compared to

O(1/\epsilon)

delay and convergence under

\mathtt{LRAM}

(

\delta_r

and

\delta_z

are maximum estimation errors for reward and system dynamics). Our results reveal that information of different system components can play very different roles in algorithm performance and provide a systematic way for designing joint learning-control algorithms for dynamic systems

arXiv.org e-Print Archive

Crossref

Learning-aided Stochastic Network Optimization with Imperfect State Prediction

Author: Ananthanarayanan G.
Bertsekas D. P.
Tadrous J.
Xu K.
Zhao S.
Publication venue
Publication date: 06/07/2018
Field of study

We investigate the problem of stochastic network optimization in the presence of imperfect state prediction and non-stationarity. Based on a novel distribution-accuracy curve prediction model, we develop the predictive learning-aided control (PLC) algorithm, which jointly utilizes historic and predicted network state information for decision making. PLC is an online algorithm that requires zero a-prior system statistical information, and consists of three key components, namely sequential distribution estimation and change detection, dual learning, and online queue-based control. Specifically, we show that PLC simultaneously achieves good long-term performance, short-term queue size reduction, accurate change detection, and fast algorithm convergence. In particular, for stationary networks, PLC achieves a near-optimal

[O(\epsilon)

O(\log(1/\epsilon)^2)]

utility-delay tradeoff. For non-stationary networks, \plc{} obtains an

[O(\epsilon), O(\log^2(1/\epsilon)

+ \min(\epsilon^{c/2-1}, e_w/\epsilon))]

utility-backlog tradeoff for distributions that last

\Theta(\frac{\max(\epsilon^{-c}, e_w^{-2})}{\epsilon^{1+a}})

time, where

e_w

is the prediction accuracy and

a=\Theta(1)>0

is a constant (the Backpressue algorithm \cite{neelynowbook} requires an

O(\epsilon^{-2})

length for the same utility performance with a larger backlog). Moreover, PLC detects distribution change

O(w)

slots faster with high probability (

w

is the prediction size) and achieves an

O(\min(\epsilon^{-1+c/2}, e_w/\epsilon)+\log^2(1/\epsilon))

convergence time. Our results demonstrate that state prediction (even imperfect) can help (i) achieve faster detection and convergence, and (ii) obtain better utility-delay tradeoffs

arXiv.org e-Print Archive

Crossref