45 research outputs found
No Provisioned Concurrency: Fast RDMA-codesigned Remote Fork for Serverless Computing
Serverless platforms essentially face a tradeoff between container startup
time and provisioned concurrency (i.e., cached instances), which is further
exaggerated by the frequent need for remote container initialization. This
paper presents MITOSIS, an operating system primitive that provides fast remote
fork, which exploits a deep codesign of the OS kernel with RDMA. By leveraging
the fast remote read capability of RDMA and partial state transfer across
serverless containers, MITOSIS bridges the performance gap between local and
remote container initialization. MITOSIS is the first to fork over 10,000 new
containers from one instance across multiple machines within a second, while
allowing the new containers to efficiently transfer the pre-materialized states
of the forked one. We have implemented MITOSIS on Linux and integrated it with
FN, a popular serverless platform. Under load spikes in real-world serverless
workloads, MITOSIS reduces the function tail latency by 89% with orders of
magnitude lower memory usage. For serverless workflow that requires state
transfer, MITOSIS improves its execution time by 86%.Comment: To appear in OSDI'2
RĂ©plication de requĂȘtes pour la tolĂ©rance aux pannes de FaaS
Function-as-a-Service (FaaS) is a popular programming model for building serverless applications, supported by all major cloud providers and many open-source software frameworks. One of the main challenges for FaaS providers is providing fault-tolerance for the deployed applications. The basic fault-tolerance mechanism in current FaaS platforms is automatically retrying function invocations. Although the retry mechanism is well suited for transient faults, it incurs delays in recovering from other types of faults, such as node crashes. This paper proposes the integration of a Request Replication mechanism in FaaS platforms and describes how this integration was implemented in a well-known, open-source platform. The paper provides a detailed experimental comparison of the proposed mechanism with the retry mechanism and an Active-Standby mechanism under different failure scenarios.Le Function-as-a-Service (FaaS) est un modĂšle de programmation populaire pour la crĂ©ation dâapplications sans serveur, pris en charge par tous les principaux fournisseurs de cloud et de nombreux frameworks logiciels open source. Lâun des principaux dĂ©fis pour les fournisseurs de FaaS est de fournir une tolĂ©rance aux pannes pour les applications dĂ©ployĂ©es. Le mĂ©canisme de base de tolĂ©rance aux pannes des plates-formes FaaS actuelles rĂ©essaie automatiquement les appels de fonction. Bien que le mĂ©canisme de nouvelle tentative soit bien adaptĂ© aux pannestransitoires, il entraĂźne des retards dans la rĂ©cupĂ©ration dâautres types de pannes, telles que les pannes de noeuds. Cet article propose lâintĂ©gration dâun mĂ©canisme de rĂ©plication de requĂȘtes dans les plates-formes FaaS et dĂ©crit comment cette intĂ©gration a Ă©tĂ© implĂ©mentĂ©e dans une plate-forme open source bien connue. Lâarticle fournit une comparaison expĂ©rimentale dĂ©taillĂ©e du mĂ©canisme proposĂ© avec le mĂ©canisme de nouvelle tentative et un mĂ©canisme Active-Standby sous diffĂ©rents scĂ©narios de panne
Using Unused: Non-Invasive Dynamic FaaS Infrastructure with HPC-Whisk
Modern HPC workload managers and their careful tuning contribute to the high
utilization of HPC clusters. However, due to inevitable uncertainty it is
impossible to completely avoid node idleness. Although such idle slots are
usually too short for any HPC job, they are too long to ignore them.
Function-as-a-Service (FaaS) paradigm promisingly fills this gap, and can be a
good match, as typical FaaS functions last seconds, not hours. Here we show how
to build a FaaS infrastructure on idle nodes in an HPC cluster in such a way
that it does not affect the performance of the HPC jobs significantly. We
dynamically adapt to a changing set of idle physical machines, by integrating
open-source software Slurm and OpenWhisk.
We designed and implemented a prototype solution that allowed us to cover up
to 90\% of the idle time slots on a 50k-core cluster that runs production
workloads
Rise of the Planet of Serverless Computing: A Systematic Review
Serverless computing is an emerging cloud computing paradigm, being adopted to develop a wide range of software applications.
It allows developers to focus on the application logic in the granularity of function, thereby freeing developers from tedious and
error-prone infrastructure management. Meanwhile, its unique characteristic poses new challenges to the development and deployment
of serverless-based applications. To tackle these challenges, enormous research efforts have been devoted. This paper provides a
comprehensive literature review to characterize the current research state of serverless computing. Specifically, this paper covers 164
papers on 17 research directions of serverless computing, including performance optimization, programming framework, application
migration, multi-cloud development, testing and debugging, etc. It also derives research trends, focus, and commonly-used platforms
for serverless computing, as well as promising research opportunities
Interaction-Oriented Software Engineering:Programming abstractions for autonomy and decentralization
We review the main ideas and elements of Interaction-Oriented Software Engineering (IOSE), a program of research that we have pursued for the last two decades, a span of time in which it has grown from philosophy to practical programming abstractions. What distinguishes IOSE from any other program of research is its emphasis on supporting autonomy by modeling the meaning of communication and using that as the basis for engineering decentralized sociotechnical systems. Meaning sounds esoteric but is the basis for practical decision making and a holy grail for the field of distributed systems. We describe our contributions so far, directions for research, and the potential for broad impact on computing