The effectiveness of many optimal network control algorithms (e.g.,
BackPressure) relies on the premise that all of the nodes are fully
controllable. However, these algorithms may yield poor performance in a
partially-controllable network where a subset of nodes are uncontrollable and
use some unknown policy. Such a partially-controllable model is of increasing
importance in real-world networked systems such as overlay-underlay networks.
In this paper, we design optimal network control algorithms that can stabilize
a partially-controllable network. We first study the scenario where
uncontrollable nodes use a queue-agnostic policy, and propose a low-complexity
throughput-optimal algorithm, called Tracking-MaxWeight (TMW), which enhances
the original MaxWeight algorithm with an explicit learning of the policy used
by uncontrollable nodes. Next, we investigate the scenario where uncontrollable
nodes use a queue-dependent policy and the problem is formulated as an MDP with
unknown queueing dynamics. We propose a new reinforcement learning algorithm,
called Truncated Upper Confidence Reinforcement Learning (TUCRL), and prove
that TUCRL achieves tunable three-way tradeoffs between throughput, delay and
convergence rate