Cooperative inference in Mobile Edge Computing (MEC), achieved by deploying
partitioned Deep Neural Network (DNN) models between resource-constrained user
equipments (UEs) and edge servers (ESs), has emerged as a promising paradigm.
Firstly, we consider scenarios of continuous Artificial Intelligence (AI) task
arrivals, like the object detection for video streams, and utilize a serial
queuing model for the accurate evaluation of End-to-End (E2E) delay in
cooperative edge inference. Secondly, to enhance the long-term performance of
inference systems, we formulate a multi-slot stochastic E2E delay optimization
problem that jointly considers model partitioning and multi-dimensional
resource allocation. Finally, to solve this problem, we introduce a
Lyapunov-guided Multi-Dimensional Optimization algorithm (LyMDO) that decouples
the original problem into per-slot deterministic problems, where Deep
Reinforcement Learning (DRL) and convex optimization are used for joint
optimization of partitioning decisions and complementary resource allocation.
Simulation results show that our approach effectively improves E2E delay while
balancing long-term resource constraints.Comment: 7 pages, 9 figures, 1 table, 1 algorithm, to be published in IEEE
98th Vehicular Technology Conference (VTC2023-Fall