1,549 research outputs found
A two-level Markov model for packet loss in UDP/IP-based real-time video applications targeting residential users
The packet loss characteristics of Internet paths that include residential broadband links are not well understood, and there are no good models for their behaviour. This compli- cates the design of real-time video applications targeting home users, since it is difficult to choose appropriate error correction and concealment algorithms without a good model for the types of loss observed. Using measurements of residential broadband networks in the UK and Finland, we show that existing models for packet loss, such as the Gilbert model and simple hidden Markov models, do not effectively model the loss patterns seen in this environment. We present a new two-level Markov model for packet loss that can more accurately describe the characteristics of these links, and quantify the effectiveness of this model. We demonstrate that our new packet loss model allows for improved application design, by using it to model the performance of forward error correction on such links
FaaSwap: SLO-Aware, GPU-Efficient Serverless Inference via Model Swapping
The dynamic request patterns of machine learning (ML) inference workloads
have driven an increasing trend towards exploiting serverless computing for
scalable ML model serving. However, today's serverless platforms lack efficient
support for GPUs -- provisioning functions on GPUs incurs extremely high
overhead, forcing them to keep long-running even when idling for reduced cold
starts. This leads to significant resource waste to perform ML inference and
hinders the pay-per-use billing for GPUs.
In this paper, we present FaaSwap, a serverless platform enabling
fine-grained, request-level GPU sharing for resource-efficient ML inference.
FaaSwap leverages model swapping to support fast inference execution at low
resource cost. It keeps models in a host which has a large amount of cheap
memory and quickly swaps models to GPUs when requested, reducing per-function
keep-alive cost and enabling efficient GPU sharing across much more functions.
FaaSwap also supports swapping models between GPUs for load balancing and
improved inference performance. In FaaSwap, we design sophisticated request
scheduling and memory management algorithms that efficiently exploit model
swapping to reduce GPU cost and meet latency service-level objectives (SLOs)
for all inference functions. We have implemented and integrated FaaSwap into
Alibaba Cloud Function Compute (FC), one of the world's largest commercial
serverless platform. Evaluation results show that FaaSwap can achieve
low-latency model swapping, efficiently share a GPU across hundreds of
functions, and satisfy per-function latency SLOs at scale
- …