Network Aware Compute and Memory Allocation in Optically Composable Data Centres with Deep Reinforcement Learning and Graph Neural Networks

Abstract

Resource-disaggregated data centre architectures promise a means of pooling resources remotely within data centres, allowing for both more flexibility and resource efficiency underlying the increasingly important infrastructure-as-a-service business. This can be accomplished by means of using an optically circuit switched backbone in the data centre network (DCN); providing the required bandwidth and latency guarantees to ensure reliable performance when applications are run across non-local resource pools. However, resource allocation in this scenario requires both server-level \emph{and} network-level resource to be co-allocated to requests. The online nature and underlying combinatorial complexity of this problem, alongside the typical scale of DCN topologies, makes exact solutions impossible and heuristic based solutions sub-optimal or non-intuitive to design. We demonstrate that \emph{deep reinforcement learning}, where the policy is modelled by a \emph{graph neural network} can be used to learn effective \emph{network-aware} and \emph{topologically-scalable} allocation policies end-to-end. Compared to state-of-the-art heuristics for network-aware resource allocation, the method achieves up to 20%20\% higher acceptance ratio; can achieve the same acceptance ratio as the best performing heuristic with 3×3\times less networking resources available and can maintain all-around performance when directly applied (with no further training) to DCN topologies with 102×10^2\times more servers than the topologies seen during training.Comment: 10 pages + 1 appendix page, 8 figure

    Similar works

    Full text

    thumbnail-image