774 research outputs found

    TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep LearningInference in Function as a Service Environments

    Full text link
    Deep neural networks (DNNs) have become core computation components within low latency Function as a Service (FaaS) prediction pipelines: including image recognition, object detection, natural language processing, speech synthesis, and personalized recommendation pipelines. Cloud computing, as the de-facto backbone of modern computing infrastructure for both enterprise and consumer applications, has to be able to handle user-defined pipelines of diverse DNN inference workloads while maintaining isolation and latency guarantees, and minimizing resource waste. The current solution for guaranteeing isolation within FaaS is suboptimal -- suffering from "cold start" latency. A major cause of such inefficiency is the need to move large amount of model data within and across servers. We propose TrIMS as a novel solution to address these issues. Our proposed solution consists of a persistent model store across the GPU, CPU, local storage, and cloud storage hierarchy, an efficient resource management layer that provides isolation, and a succinct set of application APIs and container technologies for easy and transparent integration with FaaS, Deep Learning (DL) frameworks, and user code. We demonstrate our solution by interfacing TrIMS with the Apache MXNet framework and demonstrate up to 24x speedup in latency for image classification models and up to 210x speedup for large models. We achieve up to 8x system throughput improvement.Comment: In Proceedings CLOUD 201

    Serving deep learning models in a serverless platform

    Full text link
    Serverless computing has emerged as a compelling paradigm for the development and deployment of a wide range of event based cloud applications. At the same time, cloud providers and enterprise companies are heavily adopting machine learning and Artificial Intelligence to either differentiate themselves, or provide their customers with value added services. In this work we evaluate the suitability of a serverless computing environment for the inferencing of large neural network models. Our experimental evaluations are executed on the AWS Lambda environment using the MxNet deep learning framework. Our experimental results show that while the inferencing latency can be within an acceptable range, longer delays due to cold starts can skew the latency distribution and hence risk violating more stringent SLAs

    Rise of the Planet of Serverless Computing: A Systematic Review

    Get PDF
    Serverless computing is an emerging cloud computing paradigm, being adopted to develop a wide range of software applications. It allows developers to focus on the application logic in the granularity of function, thereby freeing developers from tedious and error-prone infrastructure management. Meanwhile, its unique characteristic poses new challenges to the development and deployment of serverless-based applications. To tackle these challenges, enormous research efforts have been devoted. This paper provides a comprehensive literature review to characterize the current research state of serverless computing. Specifically, this paper covers 164 papers on 17 research directions of serverless computing, including performance optimization, programming framework, application migration, multi-cloud development, testing and debugging, etc. It also derives research trends, focus, and commonly-used platforms for serverless computing, as well as promising research opportunities

    Reinforcement Learning (RL) Augmented Cold Start Frequency Reduction in Serverless Computing

    Full text link
    Function-as-a-Service is a cloud computing paradigm offering an event-driven execution model to applications. It features serverless attributes by eliminating resource management responsibilities from developers and offers transparent and on-demand scalability of applications. Typical serverless applications have stringent response time and scalability requirements and therefore rely on deployed services to provide quick and fault-tolerant feedback to clients. However, the FaaS paradigm suffers from cold starts as there is a non-negligible delay associated with on-demand function initialization. This work focuses on reducing the frequency of cold starts on the platform by using Reinforcement Learning. Our approach uses Q-learning and considers metrics such as function CPU utilization, existing function instances, and response failure rate to proactively initialize functions in advance based on the expected demand. The proposed solution was implemented on Kubeless and was evaluated using a normalised real-world function demand trace with matrix multiplication as the workload. The results demonstrate a favourable performance of the RL-based agent when compared to Kubeless' default policy and function keep-alive policy by improving throughput by up to 8.81% and reducing computation load and resource wastage by up to 55% and 37%, respectively, which is a direct outcome of reduced cold starts.Comment: 13 figures, 10 pages, 3 table

    Benchmarking Resource Management For Serverless Computing

    Get PDF
    Serverless computing is a way in which users or companies can build and run applications and services without having to worry about acquiring or maintaining servers and their software stacks. This new technology is a significant innovation because server management incurs a large amount of overhead and can be very complex and difficult to work with. The serverless model also allows for fine-grain billing and demand resource allocation, allowing for better scalability and cost reduction. Academic researchers and industry practitioners agree that serverless computing is an amazing innovation, but it introduces new challenges. The algorithms and protocols currently deployed for virtual server optimization in traditional cloud computing environments are not able to simultaneously achieve low latency, high throughput, and fine-grained scalability while maintaining low cost for the cloud service providers. Furthermore, in the serverless computing paradigm, computation units (i.e., functions) are stateless. Applications, specified through function workflows, do not have control over specific states or their scheduling and placement, which can sometimes lead to significant latency increases and some opportunities to optimize the usage of physical servers. Overcoming these challenges highlights some of the tension between giving programmers control and allowing providers to optimize automatically. This research identifies some of the challenges in exploring new resource management approaches for serverless computing (more specifically, FaaS) as well as attempts to deal with one of these challenges. Our experimental approach includes the deployment of an open-source serverless function framework, OpenFaaS. We focus on faasd, a more lightweight variant of OpenFaaS. Faasd was chosen over the normal OpenFaaS due to not having the higher complexity and cost of Kubernetes. As researchers in academia and industry develop new approaches for optimizing the usage of CPU, memory, and I/O for serverless platforms, the community needs to establish benchmark workloads for evaluating proposed methods. Several research groups have proposed benchmark suites in the last two years, and many others are still in development. A commonality among these benchmark tools is their complexity; for junior researchers without experience in the deployment of distributed systems, a lot of time and effort goes into deploying the benchmarking, hindering their progress in evaluating newly proposed ideas. In our work, we demonstrate that even well-regarded proposals still introduce deficiencies and deployment challenges, proposing that a simplified, constrained benchmark can be useful in preparing execution environments for the experimental evaluation with serverless services

    Adapting Microservices in the Cloud with FaaS

    Get PDF
    This project involves benchmarking, microservices and Function-as-a-service (FaaS) across the dimensions of performance and cost. In order to do a comparison this paper proposes a benchmark framework

    QoS-Aware Resource Management for Multi-phase Serverless Workflows with Aquatope

    Full text link
    Multi-stage serverless applications, i.e., workflows with many computation and I/O stages, are becoming increasingly representative of FaaS platforms. Despite their advantages in terms of fine-grained scalability and modular development, these applications are subject to suboptimal performance, resource inefficiency, and high costs to a larger degree than previous simple serverless functions. We present Aquatope, a QoS-and-uncertainty-aware resource scheduler for end-to-end serverless workflows that takes into account the inherent uncertainty present in FaaS platforms, and improves performance predictability and resource efficiency. Aquatope uses a set of scalable and validated Bayesian models to create pre-warmed containers ahead of function invocations, and to allocate appropriate resources at function granularity to meet a complex workflow's end-to-end QoS, while minimizing resource cost. Across a diverse set of analytics and interactive multi-stage serverless workloads, Aquatope significantly outperforms prior systems, reducing QoS violations by 5x, and cost by 34% on average and up to 52% compared to other QoS-meeting methods

    RFaaS: RDMA-Enabled FaaS Platform for Serverless High-Performance Computing

    Full text link
    The rigid MPI programming model and batch scheduling dominate high-performance computing. While clouds brought new levels of elasticity into the world of computing, supercomputers still suffer from low resource utilization rates. To enhance supercomputing clusters with the benefits of serverless computing, a modern cloud programming paradigm for pay-as-you-go execution of stateless functions, we present rFaaS, the first RDMA-aware Function-as-a-Service (FaaS) platform. With hot invocations and decentralized function placement, we overcome the major performance limitations of FaaS systems and provide low-latency remote invocations in multi-tenant environments. We evaluate the new serverless system through a series of microbenchmarks and show that remote functions execute with negligible performance overheads. We demonstrate how serverless computing can bring elastic resource management into MPI-based high-performance applications. Overall, our results show that MPI applications can benefit from modern cloud programming paradigms to guarantee high performance at lower resource costs

    Performance Evaluation of Serverless Applications and Infrastructures

    Get PDF
    Context. Cloud computing has become the de facto standard for deploying modern web-based software systems, which makes its performance crucial to the efficient functioning of many applications. However, the unabated growth of established cloud services, such as Infrastructure-as-a-Service (IaaS), and the emergence of new serverless services, such as Function-as-a-Service (FaaS), has led to an unprecedented diversity of cloud services with different performance characteristics. Measuring these characteristics is difficult in dynamic cloud environments due to performance variability in large-scale distributed systems with limited observability.Objective. This thesis aims to enable reproducible performance evaluation of serverless applications and their underlying cloud infrastructure.Method. A combination of literature review and empirical research established a consolidated view on serverless applications and their performance. New solutions were developed through engineering research and used to conduct performance benchmarking field experiments in cloud environments.Findings. The review of 112 FaaS performance studies from academic and industrial sources found a strong focus on a single cloud platform using artificial micro-benchmarks and discovered that most studies do not follow reproducibility principles on cloud experimentation. Characterizing 89 serverless applications revealed that they are most commonly used for short-running tasks with low data volume and bursty workloads. A novel trace-based serverless application benchmark shows that external service calls often dominate the median end-to-end latency and cause long tail latency. The latency breakdown analysis further identifies performance challenges of serverless applications, such as long delays through asynchronous function triggers, substantial runtime initialization for coldstarts, increased performance variability under bursty workloads, and heavily provider-dependent performance characteristics. The evaluation of different cloud benchmarking methodologies has shown that only selected micro-benchmarks are suitable for estimating application performance, performance variability depends on the resource type, and batch testing on the same instance with repetitions should be used for reliable performance testing.Conclusions. The insights of this thesis can guide practitioners in building performance-optimized serverless applications and researchers in reproducibly evaluating cloud performance using suitable execution methodologies and different benchmark types
    corecore