94 research outputs found
Reinforcement Learning (RL) Augmented Cold Start Frequency Reduction in Serverless Computing
Function-as-a-Service is a cloud computing paradigm offering an event-driven
execution model to applications. It features serverless attributes by
eliminating resource management responsibilities from developers and offers
transparent and on-demand scalability of applications. Typical serverless
applications have stringent response time and scalability requirements and
therefore rely on deployed services to provide quick and fault-tolerant
feedback to clients. However, the FaaS paradigm suffers from cold starts as
there is a non-negligible delay associated with on-demand function
initialization. This work focuses on reducing the frequency of cold starts on
the platform by using Reinforcement Learning. Our approach uses Q-learning and
considers metrics such as function CPU utilization, existing function
instances, and response failure rate to proactively initialize functions in
advance based on the expected demand. The proposed solution was implemented on
Kubeless and was evaluated using a normalised real-world function demand trace
with matrix multiplication as the workload. The results demonstrate a
favourable performance of the RL-based agent when compared to Kubeless' default
policy and function keep-alive policy by improving throughput by up to 8.81%
and reducing computation load and resource wastage by up to 55% and 37%,
respectively, which is a direct outcome of reduced cold starts.Comment: 13 figures, 10 pages, 3 table
Function-as-a-Service Performance Evaluation: A Multivocal Literature Review
Function-as-a-Service (FaaS) is one form of the serverless cloud computing
paradigm and is defined through FaaS platforms (e.g., AWS Lambda) executing
event-triggered code snippets (i.e., functions). Many studies that empirically
evaluate the performance of such FaaS platforms have started to appear but we
are currently lacking a comprehensive understanding of the overall domain. To
address this gap, we conducted a multivocal literature review (MLR) covering
112 studies from academic (51) and grey (61) literature. We find that existing
work mainly studies the AWS Lambda platform and focuses on micro-benchmarks
using simple functions to measure CPU speed and FaaS platform overhead (i.e.,
container cold starts). Further, we discover a mismatch between academic and
industrial sources on tested platform configurations, find that function
triggers remain insufficiently studied, and identify HTTP API gateways and
cloud storages as the most used external service integrations. Following
existing guidelines on experimentation in cloud systems, we discover many flaws
threatening the reproducibility of experiments presented in the surveyed
studies. We conclude with a discussion of gaps in literature and highlight
methodological suggestions that may serve to improve future FaaS performance
evaluation studies.Comment: improvements including postprint update
A Data-parallel Approach for Efficient Resource Utilization in Distributed Serverless Deep Learning
Serverless computing is an integral part of the recent success of cloud computing, offering cost and performance efficiency for small and large scale distributed systems. Owing to the increasing interest of developers in integrating distributed computing techniques into deep learning frameworks for better performance, serverless infrastructures have been the choice of many to host their applications. However, this computing architecture bears resource limitations which challenge the successful completion of many deep learning jobs.
In our research, an approach is presented to address timeout and memory resource limitations which are two key issues to deep learning on serverless infrastructures. Focusing on Apache OpenWhisk as severless platform, and TensorFlow as deep learning framework, our solution follows an in-depth assessment of the former and failed attempts at tackling resource constraints through system-level modifications. The proposed approach employs data parallelism and ensures the concurrent execution of separate cloud functions. A weighted averaging of intermediate models is afterwards applied to build an ensemble model ready for evaluation. Through a fine-grained system design, our solution executed and completed deep learning workflows on OpenWhisk with a 0% failure rate. Moreover, the comparison with a traditional deployment on OpenWhisk shows that our approach uses 45% less memory and reduces the execution time by 58%
QoS-Aware Resource Management for Multi-phase Serverless Workflows with Aquatope
Multi-stage serverless applications, i.e., workflows with many computation
and I/O stages, are becoming increasingly representative of FaaS platforms.
Despite their advantages in terms of fine-grained scalability and modular
development, these applications are subject to suboptimal performance, resource
inefficiency, and high costs to a larger degree than previous simple serverless
functions.
We present Aquatope, a QoS-and-uncertainty-aware resource scheduler for
end-to-end serverless workflows that takes into account the inherent
uncertainty present in FaaS platforms, and improves performance predictability
and resource efficiency. Aquatope uses a set of scalable and validated Bayesian
models to create pre-warmed containers ahead of function invocations, and to
allocate appropriate resources at function granularity to meet a complex
workflow's end-to-end QoS, while minimizing resource cost. Across a diverse set
of analytics and interactive multi-stage serverless workloads, Aquatope
significantly outperforms prior systems, reducing QoS violations by 5x, and
cost by 34% on average and up to 52% compared to other QoS-meeting methods
FedLesScan: Mitigating Stragglers in Serverless Federated Learning
Federated Learning (FL) is a machine learning paradigm that enables the
training of a shared global model across distributed clients while keeping the
training data local. While most prior work on designing systems for FL has
focused on using stateful always running components, recent work has shown that
components in an FL system can greatly benefit from the usage of serverless
computing and Function-as-a-Service technologies. To this end, distributed
training of models with serverless FL systems can be more resource-efficient
and cheaper than conventional FL systems. However, serverless FL systems still
suffer from the presence of stragglers, i.e., slow clients due to their
resource and statistical heterogeneity. While several strategies have been
proposed for mitigating stragglers in FL, most methodologies do not account for
the particular characteristics of serverless environments, i.e., cold-starts,
performance variations, and the ephemeral stateless nature of the function
instances. Towards this, we propose FedLesScan, a novel clustering-based
semi-asynchronous training strategy, specifically tailored for serverless FL.
FedLesScan dynamically adapts to the behaviour of clients and minimizes the
effect of stragglers on the overall system. We implement our strategy by
extending an open-source serverless FL system called FedLess. Moreover, we
comprehensively evaluate our strategy using the 2nd generation Google Cloud
Functions with four datasets and varying percentages of stragglers. Results
from our experiments show that compared to other approaches FedLesScan reduces
training time and cost by an average of 8% and 20% respectively while utilizing
clients better with an average increase in the effective update ratio of
17.75%.Comment: IEEE BigData 202
TAXONOMY OF SECURITY AND PRIVACY ISSUES IN SERVERLESS COMPUTING
The advent of cloud computing has led to a new era of computer usage. Networking and physical security are some of the IT infrastructure concerns that IT administrators around the world had to worry about for their individual environments. Cloud computing took away that burden and redefined the meaning of IT administrators. Serverless computing as it relates to secure software development is creating the same kind of change. Developers can quickly spin up a secure development environment in a matter of minutes without having to worry about any of the underlying infrastructure setups. In the paper, we will look at the merits and demerits of serverless computing, what is drawing the demand for serverless computing among developers, the security and privacy issues of serverless technology, and detail the parameters to consider when setting up and using a secure development environment based on serverless computin
Reducing execution time in faaS cloud platforms
Dissertion to obtain the Master’s degree in Informatics and Computers EngineeringO aumento de popularidade do processamento e execução de código na Cloud levou ao despertar do interesse pela Functions Framework da Google, sendo o objetivo principal identificar pontos de possÃvel melhoria na plataforma e a sua adaptação de forma a responder à necessidade identificada, tal como a obtenção e análise de resultados com o objetivo de validar a progressão realizada. Como necessidade da Functions Framework da Google Cloud Platform verificou-se que seria possÃvel uma adaptação de forma a promover a utilização de serviços de cache, possibilitando assim o aproveitamente de processamentos prévios das funções para acelerar a resposta a pedidos futuros. Desta forma, foram implementados 3 mecanismos de caching distintos, In-Process, Out-of-Process e Network, respondendo cada um deles a diferentes necessidades e trazendo vantagens distintas entre si. Para a extração e análise de resultados foi utilizado o Apache JMeter, sendo esta uma aplicação open source para a realização de testes de carga e medidas de performance do sistema desenvolvido. O teste envolve a execução de uma função de geração de thumbnails a partir de uma imagem, estando a função em execução na framework. Para este caso uma das métricas definidas e analisadas será o número de pedidos atendidos por segundo até atingir o ponto de saturação. Finalmente, e a partir dos resultados foi possÃvel verificar uma melhoria significativa dos tempos de resposta aos pedidos recorrendo aos mecanismos de caching. Para o caso de estudo, foi também possÃvel compreender as diferenças no processamento de imagens com dimensão pequena, média e grande na ordem dos Kbs aos poucos Mbs.The increase in popularity of code processing and execution in the Cloud led to the awakening of interest in Google’s Functions Framework, with the main objective being to identify possible improvement points in the platform and its adaptation in order to respond to the identified need, also obtaining and analysing the results in order to validate the progress made. As a need for the Google Cloud Platform Functions Framework, it was found that an
adaptation would be possible in order to promote the use of cache services, thus making it possible to take advantage of previous processing of the functions to accelerate the response to future requests. In this way, 3 different caching mechanisms were implemented, In-Process, Out-of-Process and Network, each responding to different needs and bringing different advantages. For the extraction and analysis of results, Apache JMeter was used, which is an open source application for implementing load tests and performance measures of the developed system. The test involves executing a function to generate thumbnails from an image, with the function running in the framework. For this case, one of the metrics defined and analyzed will be the number of requests served per second until reaching the saturation point. Finally, and based on the results, it was possible to verify a significant improvement in the response times to requests using caching mechanisms. For the case study, it was also possible to understand the differences in the processing of images with small, medium and large dimensions in the order of Kbs to a few Mbs.N/
- …