The execution of large deep neural networks (DNN) at mobile edge devices
requires considerable consumption of critical resources, such as energy, while
imposing demands on hardware capabilities. In approaches based on edge
computing the execution of the models is offloaded to a compute-capable device
positioned at the edge of 5G infrastructures. The main issue of the latter
class of approaches is the need to transport information-rich signals over
wireless links with limited and time-varying capacity. The recent split
computing paradigm attempts to resolve this impasse by distributing the
execution of DNN models across the layers of the systems to reduce the amount
of data to be transmitted while imposing minimal computing load on mobile
devices. In this context, we propose a novel split computing approach based on
slimmable ensemble encoders. The key advantage of our design is the ability to
adapt computational load and transmitted data size in real-time with minimal
overhead and time. This is in contrast with existing approaches, where the same
adaptation requires costly context switching and model loading. Moreover, our
model outperforms existing solutions in terms of compression efficacy and
execution time, especially in the context of weak mobile devices. We present a
comprehensive comparison with the most advanced split computing solutions, as
well as an experimental evaluation on GPU-less devices