In this paper, we investigate the power of online learning in stochastic
network optimization with unknown system statistics {\it a priori}. We are
interested in understanding how information and learning can be efficiently
incorporated into system control techniques, and what are the fundamental
benefits of doing so. We propose two \emph{Online Learning-Aided Control}
techniques, OLAC and OLAC2, that explicitly utilize the
past system information in current system control via a learning procedure
called \emph{dual learning}. We prove strong performance guarantees of the
proposed algorithms: OLAC and OLAC2 achieve the
near-optimal [O(ϵ),O([log(1/ϵ)]2)] utility-delay tradeoff
and OLAC2 possesses an O(ϵ−2/3) convergence time.
OLAC and OLAC2 are probably the first algorithms that
simultaneously possess explicit near-optimal delay guarantee and sub-linear
convergence time. Simulation results also confirm the superior performance of
the proposed algorithms in practice. To the best of our knowledge, our attempt
is the first to explicitly incorporate online learning into stochastic network
optimization and to demonstrate its power in both theory and practice