Search CORE

94 research outputs found

Reinforcement Learning (RL) Augmented Cold Start Frequency Reduction in Serverless Computing

Author: Agarwal Siddharth
Buyya Rajkumar
Rodriguez Maria A.
Publication venue
Publication date: 14/08/2023
Field of study

Function-as-a-Service is a cloud computing paradigm offering an event-driven execution model to applications. It features serverless attributes by eliminating resource management responsibilities from developers and offers transparent and on-demand scalability of applications. Typical serverless applications have stringent response time and scalability requirements and therefore rely on deployed services to provide quick and fault-tolerant feedback to clients. However, the FaaS paradigm suffers from cold starts as there is a non-negligible delay associated with on-demand function initialization. This work focuses on reducing the frequency of cold starts on the platform by using Reinforcement Learning. Our approach uses Q-learning and considers metrics such as function CPU utilization, existing function instances, and response failure rate to proactively initialize functions in advance based on the expected demand. The proposed solution was implemented on Kubeless and was evaluated using a normalised real-world function demand trace with matrix multiplication as the workload. The results demonstrate a favourable performance of the RL-based agent when compared to Kubeless' default policy and function keep-alive policy by improving throughput by up to 8.81% and reducing computation load and resource wastage by up to 55% and 37%, respectively, which is a direct outcome of reduced cold starts.Comment: 13 figures, 10 pages, 3 table

arXiv.org e-Print Archive

Function-as-a-Service Performance Evaluation: A Multivocal Literature Review

Author: Leitner Philipp
Scheuner Joel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Function-as-a-Service (FaaS) is one form of the serverless cloud computing paradigm and is defined through FaaS platforms (e.g., AWS Lambda) executing event-triggered code snippets (i.e., functions). Many studies that empirically evaluate the performance of such FaaS platforms have started to appear but we are currently lacking a comprehensive understanding of the overall domain. To address this gap, we conducted a multivocal literature review (MLR) covering 112 studies from academic (51) and grey (61) literature. We find that existing work mainly studies the AWS Lambda platform and focuses on micro-benchmarks using simple functions to measure CPU speed and FaaS platform overhead (i.e., container cold starts). Further, we discover a mismatch between academic and industrial sources on tested platform configurations, find that function triggers remain insufficiently studied, and identify HTTP API gateways and cloud storages as the most used external service integrations. Following existing guidelines on experimentation in cloud systems, we discover many flaws threatening the reproducibility of experiments presented in the surveyed studies. We conclude with a discussion of gaps in literature and highlight methodological suggestions that may serve to improve future FaaS performance evaluation studies.Comment: improvements including postprint update

arXiv.org e-Print Archive

Chalmers Research

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Serverless GEO Labels for the Semantic Sensor Web

Author: Graupner Anika
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 11th International Conference on Geographic Information Science (GIScience 2021) - Part I
Publication date: 01/01/2020
Field of study

Dagstuhl Research Online Publication Server

A Data-parallel Approach for Efficient Resource Utilization in Distributed Serverless Deep Learning

Author: Assogba Kevin Tunder Elom
Publication venue: RIT Scholar Works
Publication date: 01/08/2020
Field of study

Serverless computing is an integral part of the recent success of cloud computing, offering cost and performance efficiency for small and large scale distributed systems. Owing to the increasing interest of developers in integrating distributed computing techniques into deep learning frameworks for better performance, serverless infrastructures have been the choice of many to host their applications. However, this computing architecture bears resource limitations which challenge the successful completion of many deep learning jobs. In our research, an approach is presented to address timeout and memory resource limitations which are two key issues to deep learning on serverless infrastructures. Focusing on Apache OpenWhisk as severless platform, and TensorFlow as deep learning framework, our solution follows an in-depth assessment of the former and failed attempts at tackling resource constraints through system-level modifications. The proposed approach employs data parallelism and ensures the concurrent execution of separate cloud functions. A weighted averaging of intermediate models is afterwards applied to build an ensemble model ready for evaluation. Through a fine-grained system design, our solution executed and completed deep learning workflows on OpenWhisk with a 0% failure rate. Moreover, the comparison with a traditional deployment on OpenWhisk shows that our approach uses 45% less memory and reduces the execution time by 58%

RIT Scholar Works

QoS-Aware Resource Management for Multi-phase Serverless Workflows with Aquatope

Author: Delimitrou Christina
Zhang Yanqi
Zhou Zhuangzhuang
Publication venue
Publication date: 28/12/2022
Field of study

Multi-stage serverless applications, i.e., workflows with many computation and I/O stages, are becoming increasingly representative of FaaS platforms. Despite their advantages in terms of fine-grained scalability and modular development, these applications are subject to suboptimal performance, resource inefficiency, and high costs to a larger degree than previous simple serverless functions. We present Aquatope, a QoS-and-uncertainty-aware resource scheduler for end-to-end serverless workflows that takes into account the inherent uncertainty present in FaaS platforms, and improves performance predictability and resource efficiency. Aquatope uses a set of scalable and validated Bayesian models to create pre-warmed containers ahead of function invocations, and to allocate appropriate resources at function granularity to meet a complex workflow's end-to-end QoS, while minimizing resource cost. Across a diverse set of analytics and interactive multi-stage serverless workloads, Aquatope significantly outperforms prior systems, reducing QoS violations by 5x, and cost by 34% on average and up to 52% compared to other QoS-meeting methods

arXiv.org e-Print Archive

FedLesScan: Mitigating Stragglers in Serverless Federated Learning

Author: Abboud Osama
Chadha Mohak
Elzohairy Mohamed
Gerndt Michael
Grafberger Andreas
Gu Jianfeng
Jindal Anshul
Publication venue
Publication date: 28/11/2022
Field of study

Federated Learning (FL) is a machine learning paradigm that enables the training of a shared global model across distributed clients while keeping the training data local. While most prior work on designing systems for FL has focused on using stateful always running components, recent work has shown that components in an FL system can greatly benefit from the usage of serverless computing and Function-as-a-Service technologies. To this end, distributed training of models with serverless FL systems can be more resource-efficient and cheaper than conventional FL systems. However, serverless FL systems still suffer from the presence of stragglers, i.e., slow clients due to their resource and statistical heterogeneity. While several strategies have been proposed for mitigating stragglers in FL, most methodologies do not account for the particular characteristics of serverless environments, i.e., cold-starts, performance variations, and the ephemeral stateless nature of the function instances. Towards this, we propose FedLesScan, a novel clustering-based semi-asynchronous training strategy, specifically tailored for serverless FL. FedLesScan dynamically adapts to the behaviour of clients and minimizes the effect of stragglers on the overall system. We implement our strategy by extending an open-source serverless FL system called FedLess. Moreover, we comprehensively evaluate our strategy using the 2nd generation Google Cloud Functions with four datasets and varying percentages of stragglers. Results from our experiments show that compared to other approaches FedLesScan reduces training time and cost by an average of 8% and 20% respectively while utilizing clients better with an average increase in the effective update ratio of 17.75%.Comment: IEEE BigData 202

arXiv.org e-Print Archive

TAXONOMY OF SECURITY AND PRIVACY ISSUES IN SERVERLESS COMPUTING

Author: Pusuluri Vasanta Swarna Ratnam
Publication venue: The Repository at St. Cloud State
Publication date: 01/08/2022
Field of study

The advent of cloud computing has led to a new era of computer usage. Networking and physical security are some of the IT infrastructure concerns that IT administrators around the world had to worry about for their individual environments. Cloud computing took away that burden and redefined the meaning of IT administrators. Serverless computing as it relates to secure software development is creating the same kind of change. Developers can quickly spin up a secure development environment in a matter of minutes without having to worry about any of the underlying infrastructure setups. In the paper, we will look at the merits and demerits of serverless computing, what is drawing the demand for serverless computing among developers, the security and privacy issues of serverless technology, and detail the parameters to consider when setting up and using a secure development environment based on serverless computin

St. Cloud State University

Reducing execution time in faaS cloud platforms

Author: Rosa Ivan André Gomes
Publication venue
Publication date: 01/12/2022
Field of study

Dissertion to obtain the Master’s degree in Informatics and Computers EngineeringO aumento de popularidade do processamento e execução de código na Cloud levou ao despertar do interesse pela Functions Framework da Google, sendo o objetivo principal identificar pontos de possível melhoria na plataforma e a sua adaptação de forma a responder à necessidade identificada, tal como a obtenção e análise de resultados com o objetivo de validar a progressão realizada. Como necessidade da Functions Framework da Google Cloud Platform verificou-se que seria possível uma adaptação de forma a promover a utilização de serviços de cache, possibilitando assim o aproveitamente de processamentos prévios das funções para acelerar a resposta a pedidos futuros. Desta forma, foram implementados 3 mecanismos de caching distintos, In-Process, Out-of-Process e Network, respondendo cada um deles a diferentes necessidades e trazendo vantagens distintas entre si. Para a extração e análise de resultados foi utilizado o Apache JMeter, sendo esta uma aplicação open source para a realização de testes de carga e medidas de performance do sistema desenvolvido. O teste envolve a execução de uma função de geração de thumbnails a partir de uma imagem, estando a função em execução na framework. Para este caso uma das métricas definidas e analisadas será o número de pedidos atendidos por segundo até atingir o ponto de saturação. Finalmente, e a partir dos resultados foi possível verificar uma melhoria significativa dos tempos de resposta aos pedidos recorrendo aos mecanismos de caching. Para o caso de estudo, foi também possível compreender as diferenças no processamento de imagens com dimensão pequena, média e grande na ordem dos Kbs aos poucos Mbs.The increase in popularity of code processing and execution in the Cloud led to the awakening of interest in Google’s Functions Framework, with the main objective being to identify possible improvement points in the platform and its adaptation in order to respond to the identified need, also obtaining and analysing the results in order to validate the progress made. As a need for the Google Cloud Platform Functions Framework, it was found that an adaptation would be possible in order to promote the use of cache services, thus making it possible to take advantage of previous processing of the functions to accelerate the response to future requests. In this way, 3 different caching mechanisms were implemented, In-Process, Out-of-Process and Network, each responding to different needs and bringing different advantages. For the extraction and analysis of results, Apache JMeter was used, which is an open source application for implementing load tests and performance measures of the developed system. The test involves executing a function to generate thumbnails from an image, with the function running in the framework. For this case, one of the metrics defined and analyzed will be the number of requests served per second until reaching the saturation point. Finally, and based on the results, it was possible to verify a significant improvement in the response times to requests using caching mechanisms. For the case study, it was also possible to understand the differences in the processing of images with small, medium and large dimensions in the order of Kbs to a few Mbs.N/

Repositório Científico do Instituto Politécnico de Lisboa