Decentralized bilevel optimization has received increasing attention recently
due to its foundational role in many emerging multi-agent learning paradigms
(e.g., multi-agent meta-learning and multi-agent reinforcement learning) over
peer-to-peer edge networks. However, to work with the limited computation and
communication capabilities of edge networks, a major challenge in developing
decentralized bilevel optimization techniques is to lower sample and
communication complexities. This motivates us to develop a new decentralized
bilevel optimization called DIAMOND (decentralized single-timescale stochastic
approximation with momentum and gradient-tracking). The contributions of this
paper are as follows: i) our DIAMOND algorithm adopts a single-loop structure
rather than following the natural double-loop structure of bilevel
optimization, which offers low computation and implementation complexity; ii)
compared to existing approaches, the DIAMOND algorithm does not require any
full gradient evaluations, which further reduces both sample and computational
complexities; iii) through a careful integration of momentum information and
gradient tracking techniques, we show that the DIAMOND algorithm enjoys
O(ϵ−3/2) in sample and communication complexities for
achieving an ϵ-stationary solution, both of which are independent of
the dataset sizes and significantly outperform existing works. Extensive
experiments also verify our theoretical findings