We consider the problem of \emph{optimal matching with queues} in dynamic
systems and investigate the value-of-information. In such systems, the
operators match tasks and resources stored in queues, with the objective of
maximizing the system utility of the matching reward profile, minus the average
matching cost. This problem appears in many practical systems and the main
challenges are the no-underflow constraints, and the lack of matching-reward
information and system dynamics statistics. We develop two online matching
algorithms: Learning-aided Reward optimAl Matching (LRAM) and
Dual-LRAM (DRAM) to effectively resolve both challenges.
Both algorithms are equipped with a learning module for estimating the
matching-reward information, while DRAM incorporates an additional
module for learning the system dynamics. We show that both algorithms achieve
an O(ϵ+δr) close-to-optimal utility performance for any
ϵ>0, while DRAM achieves a faster convergence speed and a
better delay compared to LRAM, i.e., O(δz/ϵ+log(1/ϵ)2)) delay and O(δz/ϵ) convergence under
DRAM compared to O(1/ϵ) delay and convergence under
LRAM (δr and δz are maximum estimation errors for
reward and system dynamics). Our results reveal that information of different
system components can play very different roles in algorithm performance and
provide a systematic way for designing joint learning-control algorithms for
dynamic systems