12 research outputs found

    DFlow: Efficient Dataflow-based Invocation Workflow Execution for Function-as-a-Service

    Full text link
    The Serverless Computing is becoming increasingly popular due to its ease of use and fine-grained billing. These features make it appealing for stateful application or serverless workflow. However, current serverless workflow systems utilize a controlflow-based invocation pattern to invoke functions. In this execution pattern, the function invocation depends on the state of the function. A function can only begin executing once all its precursor functions have completed. As a result, this pattern may potentially lead to longer end-to-end execution time. We design and implement the DFlow, a novel dataflow-based serverless workflow system that achieves high performance for serverless workflow. DFlow introduces a distributed scheduler (DScheduler) by using the dataflow-based invocation pattern to invoke functions. In this pattern, the function invocation depends on the data dependency between functions. The function can start to execute even its precursor functions are still running. DFlow further features a distributed store (DStore) that utilizes effective fine-grained optimization techniques to eliminate function interaction, thereby enabling efficient data exchange. With the support of DScheduler and DStore, DFlow can achieving an average improvement of 60% over CFlow, 40% over FaaSFlow, 25% over FaasFlowRedis, and 40% over KNIX on 99%-ile latency respectively. Further, it can improve network bandwidth utilization by 2x-4x over CFlow and 1.5x-3x over FaaSFlow, FaaSFlowRedis and KNIX, respectively. DFlow effectively reduces the cold startup latency, achieving an average improvement of 5.6x over CFlow and 1.1x over FaaSFlowComment: 22 pages, 13 figure

    BeeFlow: Behavior Tree-based Serverless Workflow Modeling and Scheduling for Resource-Constrained Edge Clusters

    Full text link
    Serverless computing has gained popularity in edge computing due to its flexible features, including the pay-per-use pricing model, auto-scaling capabilities, and multi-tenancy support. Complex Serverless-based applications typically rely on Serverless workflows (also known as Serverless function orchestration) to express task execution logic, and numerous application- and system-level optimization techniques have been developed for Serverless workflow scheduling. However, there has been limited exploration of optimizing Serverless workflow scheduling in edge computing systems, particularly in high-density, resource-constrained environments such as system-on-chip clusters and single-board-computer clusters. In this work, we discover that existing Serverless workflow scheduling techniques typically assume models with limited expressiveness and cause significant resource contention. To address these issues, we propose modeling Serverless workflows using behavior trees, a novel and fundamentally different approach from existing directed-acyclic-graph- and state machine-based models. Behavior tree-based modeling allows for easy analysis without compromising workflow expressiveness. We further present observations derived from the inherent tree structure of behavior trees for contention-free function collections and awareness of exact and empirical concurrent function invocations. Based on these observations, we introduce BeeFlow, a behavior tree-based Serverless workflow system tailored for resource-constrained edge clusters. Experimental results demonstrate that BeeFlow achieves up to 3.2X speedup in a high-density, resource-constrained edge testbed and 2.5X speedup in a high-profile cloud testbed, compared with the state-of-the-art.Comment: Accepted by Journal of Systems Architectur

    Learning Very Large Configuration Spaces: What Matters for Linux Kernel Sizes

    Get PDF
    Linux kernels are used in a wide variety of appliances, many of them having strong requirements on the kernel size due to constraints such as limited memory or instant boot. With more than ten thousands of configuration options to choose from, obtaining a suitable trade off between kernel size and functionality is an extremely hard problem. Developers, contributors, and users actually spend significant effort to document, understand, and eventually tune (combinations of) options for meeting a kernel size. In this paper, we investigate how machine learning can help explain what matters for predicting a given Linux kernel size. Unveiling what matters in such very large configuration space is challenging for two reasons: (1) whatever the time we spend on it, we can only build and measure a tiny fraction of possible kernel configurations; (2) the prediction model should be both accurate and interpretable. We compare different machine learning algorithms and demonstrate the benefits of specific feature encoding and selection methods to learn an accurate model that is fast to compute and simple to interpret. Our results are validated over 95,854 kernel configurations and show that we can achieve low prediction errors over a reduced set of options. We also show that we can extract interpretable information for refining documentation and experts' knowledge of Linux, or even assigning more sensible default values to options

    Ginger: A Transactional Middleware with Data and Operation Centric Mixed Consistency

    Get PDF
    Many modern digital services to correspond to user demand need to offer high availability and low response times. To that end, a lot of digital services resort to geo-replicateddistributed systems. These systems are deployed closer to users, splitting latency acrossmultiple servers and allowing for faster access and communication. However, to accommodate these systems the data stores are also split up across multiple locations. Committing an operation is such systems requires coordination among the multiple replicas.These systems must allow data to be stored as fast as possible without breaking safety constraints of the developers systems.There are three main approaches to define the level of consistency to be guaranteed when accessing the data: over data, over operations or over transactions. The problem with approaches such as consistency over data or consistency over transactions is that they are very limited, as they can result in operations that could be executed in lower consistency levels to be executed at higher consistency levels. Our approach to this problemis the conciliation of executing transactions while expressing consistency in both data and operations. We instantiate this proposition in a middleware system, called Ginger,that is deployed between the user and the data stores. Ginger benefits from all the other approaches, allowing for execution of transactions, that include operations with different levels of consistency, over data with different levels of consistency. This provides the benefits of the isolation from transactions while also providing the performance and control,that consistency defined over operations and consistency defined over data provide.Our experimental results show that Ginger comparing to previously mentioned approaches, such as consistency over data and consistency over transaction, provides faster transaction committing speeds. Ginger serves as proof of concept that using consistency defined both over data and operations while using transactions is possible and may be aviable approach. Further development of the system will provide more functionalities,further evaluation, and a more in-depth comparison to other systems.Os serviços digitais modernos para corresponder às necessidades dos utilizadores precisam de oferecer alta disponibilidade e baixos tempos de resposta. Para tal, os serviços digitais recorrem a sistemas geo-replicados. Esses sistemas são implantados perto dos utilizadores, dividindo a latência entre servidores. No entanto, para acomodar esses sistemas, os serviços de armazenamentos de dados são divididos. O commiting de uma operação nesses sistemas requer coordenação entre múltiplas réplicas. Esses sistemas devem permitir que os dados sejam armazenados rapidamente, sem quebrar restrições de segurança.Existem três abordagens principais para definir o nível de consistência a ser garantido durante o acesso aos dados: sobre dados, sobre operações ou sobre transacções. O problema com abordagens como consistência sobre dados ou sobre transacções é que são limitadas, podendo resultar em operações de níveis de consistência baixos serem executadas com níveis de consistência mais altos. A nossa abordagem a este problema é a conciliação da expressão de consistência tanto nos dados como nas operações. Instanciámos esta proposição num sistema de middleware, denominado Ginger, que é implantado entre o usuário e os serviços de armazenamentos de dados. O Ginger beneficia de todas as abordagens referidas, permitindo a execução de transacções, que incluem operações com diferentes níveis de consistência, sobre dados com diferentes níveis de consistência. Isto beneficia do isolamento das transacções, ao mesmo tempo que fornece o desempenho e o controle, que a consistência definida nas operações e a consistência definida nos dados fornecem.Os nossos resultados experimentais mostram que o Ginger, em comparação com as outras abordagens, como por exemplo consistência sobre os dados e consistência sobre a transação, fornece velocidades de commiting de transacções mais rápidas. Ginger serve como prova de conceito de que o uso de transacções com níveis de consistência definidos sobre os dados e operações é possível e pode ser uma abordagem viável. O desenvolvimento futuro do sistema fornecerá mais funcionalidades, avaliação adicional e uma comparação mais aprofundada com outros sistemas
    corecore