774 research outputs found
TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep LearningInference in Function as a Service Environments
Deep neural networks (DNNs) have become core computation components within
low latency Function as a Service (FaaS) prediction pipelines: including image
recognition, object detection, natural language processing, speech synthesis,
and personalized recommendation pipelines. Cloud computing, as the de-facto
backbone of modern computing infrastructure for both enterprise and consumer
applications, has to be able to handle user-defined pipelines of diverse DNN
inference workloads while maintaining isolation and latency guarantees, and
minimizing resource waste. The current solution for guaranteeing isolation
within FaaS is suboptimal -- suffering from "cold start" latency. A major cause
of such inefficiency is the need to move large amount of model data within and
across servers. We propose TrIMS as a novel solution to address these issues.
Our proposed solution consists of a persistent model store across the GPU, CPU,
local storage, and cloud storage hierarchy, an efficient resource management
layer that provides isolation, and a succinct set of application APIs and
container technologies for easy and transparent integration with FaaS, Deep
Learning (DL) frameworks, and user code. We demonstrate our solution by
interfacing TrIMS with the Apache MXNet framework and demonstrate up to 24x
speedup in latency for image classification models and up to 210x speedup for
large models. We achieve up to 8x system throughput improvement.Comment: In Proceedings CLOUD 201
Serving deep learning models in a serverless platform
Serverless computing has emerged as a compelling paradigm for the development
and deployment of a wide range of event based cloud applications. At the same
time, cloud providers and enterprise companies are heavily adopting machine
learning and Artificial Intelligence to either differentiate themselves, or
provide their customers with value added services. In this work we evaluate the
suitability of a serverless computing environment for the inferencing of large
neural network models. Our experimental evaluations are executed on the AWS
Lambda environment using the MxNet deep learning framework. Our experimental
results show that while the inferencing latency can be within an acceptable
range, longer delays due to cold starts can skew the latency distribution and
hence risk violating more stringent SLAs
Rise of the Planet of Serverless Computing: A Systematic Review
Serverless computing is an emerging cloud computing paradigm, being adopted to develop a wide range of software applications.
It allows developers to focus on the application logic in the granularity of function, thereby freeing developers from tedious and
error-prone infrastructure management. Meanwhile, its unique characteristic poses new challenges to the development and deployment
of serverless-based applications. To tackle these challenges, enormous research efforts have been devoted. This paper provides a
comprehensive literature review to characterize the current research state of serverless computing. Specifically, this paper covers 164
papers on 17 research directions of serverless computing, including performance optimization, programming framework, application
migration, multi-cloud development, testing and debugging, etc. It also derives research trends, focus, and commonly-used platforms
for serverless computing, as well as promising research opportunities
Reinforcement Learning (RL) Augmented Cold Start Frequency Reduction in Serverless Computing
Function-as-a-Service is a cloud computing paradigm offering an event-driven
execution model to applications. It features serverless attributes by
eliminating resource management responsibilities from developers and offers
transparent and on-demand scalability of applications. Typical serverless
applications have stringent response time and scalability requirements and
therefore rely on deployed services to provide quick and fault-tolerant
feedback to clients. However, the FaaS paradigm suffers from cold starts as
there is a non-negligible delay associated with on-demand function
initialization. This work focuses on reducing the frequency of cold starts on
the platform by using Reinforcement Learning. Our approach uses Q-learning and
considers metrics such as function CPU utilization, existing function
instances, and response failure rate to proactively initialize functions in
advance based on the expected demand. The proposed solution was implemented on
Kubeless and was evaluated using a normalised real-world function demand trace
with matrix multiplication as the workload. The results demonstrate a
favourable performance of the RL-based agent when compared to Kubeless' default
policy and function keep-alive policy by improving throughput by up to 8.81%
and reducing computation load and resource wastage by up to 55% and 37%,
respectively, which is a direct outcome of reduced cold starts.Comment: 13 figures, 10 pages, 3 table
Benchmarking Resource Management For Serverless Computing
Serverless computing is a way in which users or companies can build and run applications and services without having to worry about acquiring or maintaining servers and their software stacks. This new technology is a significant innovation because server management incurs a large amount of overhead and can be very complex and difficult to work with. The serverless model also allows for fine-grain billing and demand resource allocation, allowing for better scalability and cost reduction. Academic researchers and industry practitioners agree that serverless computing is an amazing innovation, but it introduces new challenges. The algorithms and protocols currently deployed for virtual server optimization in traditional cloud computing environments are not able to simultaneously achieve low latency, high throughput, and fine-grained scalability while maintaining low cost for the cloud service providers. Furthermore, in the serverless computing paradigm, computation units (i.e., functions) are stateless. Applications, specified through function workflows, do not have control over specific states or their scheduling and placement, which can sometimes lead to significant latency increases and some opportunities to optimize the usage of physical servers. Overcoming these challenges highlights some of the tension between giving programmers control and allowing providers to optimize automatically. This research identifies some of the challenges in exploring new resource management approaches for serverless computing (more specifically, FaaS) as well as attempts to deal with one of these challenges. Our experimental approach includes the deployment of an open-source serverless function framework, OpenFaaS. We focus on faasd, a more lightweight variant of OpenFaaS. Faasd was chosen over the normal OpenFaaS due to not having the higher complexity and cost of Kubernetes. As researchers in academia and industry develop new approaches for optimizing the usage of CPU, memory, and I/O for serverless platforms, the community needs to establish benchmark workloads for evaluating proposed methods. Several research groups have proposed benchmark suites in the last two years, and many others are still in development. A commonality among these benchmark tools is their complexity; for junior researchers without experience in the deployment of distributed systems, a lot of time and effort goes into deploying the benchmarking, hindering their progress in evaluating newly proposed ideas. In our work, we demonstrate that even well-regarded proposals still introduce deficiencies and deployment challenges, proposing that a simplified, constrained benchmark can be useful in preparing execution environments for the experimental evaluation with serverless services
Adapting Microservices in the Cloud with FaaS
This project involves benchmarking, microservices and Function-as-a-service (FaaS) across the dimensions of performance and cost. In order to do a comparison this paper proposes a benchmark framework
QoS-Aware Resource Management for Multi-phase Serverless Workflows with Aquatope
Multi-stage serverless applications, i.e., workflows with many computation
and I/O stages, are becoming increasingly representative of FaaS platforms.
Despite their advantages in terms of fine-grained scalability and modular
development, these applications are subject to suboptimal performance, resource
inefficiency, and high costs to a larger degree than previous simple serverless
functions.
We present Aquatope, a QoS-and-uncertainty-aware resource scheduler for
end-to-end serverless workflows that takes into account the inherent
uncertainty present in FaaS platforms, and improves performance predictability
and resource efficiency. Aquatope uses a set of scalable and validated Bayesian
models to create pre-warmed containers ahead of function invocations, and to
allocate appropriate resources at function granularity to meet a complex
workflow's end-to-end QoS, while minimizing resource cost. Across a diverse set
of analytics and interactive multi-stage serverless workloads, Aquatope
significantly outperforms prior systems, reducing QoS violations by 5x, and
cost by 34% on average and up to 52% compared to other QoS-meeting methods
RFaaS: RDMA-Enabled FaaS Platform for Serverless High-Performance Computing
The rigid MPI programming model and batch scheduling dominate
high-performance computing. While clouds brought new levels of elasticity into
the world of computing, supercomputers still suffer from low resource
utilization rates. To enhance supercomputing clusters with the benefits of
serverless computing, a modern cloud programming paradigm for pay-as-you-go
execution of stateless functions, we present rFaaS, the first RDMA-aware
Function-as-a-Service (FaaS) platform. With hot invocations and decentralized
function placement, we overcome the major performance limitations of FaaS
systems and provide low-latency remote invocations in multi-tenant
environments. We evaluate the new serverless system through a series of
microbenchmarks and show that remote functions execute with negligible
performance overheads. We demonstrate how serverless computing can bring
elastic resource management into MPI-based high-performance applications.
Overall, our results show that MPI applications can benefit from modern cloud
programming paradigms to guarantee high performance at lower resource costs
Performance Evaluation of Serverless Applications and Infrastructures
Context. Cloud computing has become the de facto standard for deploying modern web-based software systems, which makes its performance crucial to the efficient functioning of many applications. However, the unabated growth of established cloud services, such as Infrastructure-as-a-Service (IaaS), and the emergence of new serverless services, such as Function-as-a-Service (FaaS), has led to an unprecedented diversity of cloud services with different performance characteristics. Measuring these characteristics is difficult in dynamic cloud environments due to performance variability in large-scale distributed systems with limited observability.Objective. This thesis aims to enable reproducible performance evaluation of serverless applications and their underlying cloud infrastructure.Method. A combination of literature review and empirical research established a consolidated view on serverless applications and their performance. New solutions were developed through engineering research and used to conduct performance benchmarking field experiments in cloud environments.Findings. The review of 112 FaaS performance studies from academic and industrial sources found a strong focus on a single cloud platform using artificial micro-benchmarks and discovered that most studies do not follow reproducibility principles on cloud experimentation. Characterizing 89 serverless applications revealed that they are most commonly used for short-running tasks with low data volume and bursty workloads. A novel trace-based serverless application benchmark shows that external service calls often dominate the median end-to-end latency and cause long tail latency. The latency breakdown analysis further identifies performance challenges of serverless applications, such as long delays through asynchronous function triggers, substantial runtime initialization for coldstarts, increased performance variability under bursty workloads, and heavily provider-dependent performance characteristics. The evaluation of different cloud benchmarking methodologies has shown that only selected micro-benchmarks are suitable for estimating application performance, performance variability depends on the resource type, and batch testing on the same instance with repetitions should be used for reliable performance testing.Conclusions. The insights of this thesis can guide practitioners in building performance-optimized serverless applications and researchers in reproducibly evaluating cloud performance using suitable execution methodologies and different benchmark types
- …