224 research outputs found

    RackBlox: A Software-Defined Rack-Scale Storage System with Network-Storage Co-Design

    Full text link
    Software-defined networking (SDN) and software-defined flash (SDF) have been serving as the backbone of modern data centers. They are managed separately to handle I/O requests. At first glance, this is a reasonable design by following the rack-scale hierarchical design principles. However, it suffers from suboptimal end-to-end performance, due to the lack of coordination between SDN and SDF. In this paper, we co-design the SDN and SDF stack by redefining the functions of their control plane and data plane, and splitting up them within a new architecture named RackBlox. RackBlox decouples the storage management functions of flash-based solid-state drives (SSDs), and allow the SDN to track and manage the states of SSDs in a rack. Therefore, we can enable the state sharing between SDN and SDF, and facilitate global storage resource management. RackBlox has three major components: (1) coordinated I/O scheduling, in which it dynamically adjusts the I/O scheduling in the storage stack with the measured and predicted network latency, such that it can coordinate the effort of I/O scheduling across the network and storage stack for achieving predictable end-to-end performance; (2) coordinated garbage collection (GC), in which it will coordinate the GC activities across the SSDs in a rack to minimize their impact on incoming I/O requests; (3) rack-scale wear leveling, in which it enables global wear leveling among SSDs in a rack by periodically swapping data, for achieving improved device lifetime for the entire rack. We implement RackBlox using programmable SSDs and switch. Our experiments demonstrate that RackBlox can reduce the tail latency of I/O requests by up to 5.8x over state-of-the-art rack-scale storage systems.Comment: 14 pages. Published in published in ACM SIGOPS 29th Symposium on Operating Systems Principles (SOSP'23

    Anpassen verteilter eingebetteter Anwendungen im laufenden Betrieb

    Get PDF
    The availability of third-party apps is among the key success factors for software ecosystems: The users benefit from more features and innovation speed, while third-party solution vendors can leverage the platform to create successful offerings. However, this requires a certain decoupling of engineering activities of the different parties not achieved for distributed control systems, yet. While late and dynamic integration of third-party components would be required, resulting control systems must provide high reliability regarding real-time requirements, which leads to integration complexity. Closing this gap would particularly contribute to the vision of software-defined manufacturing, where an ecosystem of modern IT-based control system components could lead to faster innovations due to their higher abstraction and availability of various frameworks. Therefore, this thesis addresses the research question: How we can use modern IT technologies and enable independent evolution and easy third-party integration of software components in distributed control systems, where deterministic end-to-end reactivity is required, and especially, how can we apply distributed changes to such systems consistently and reactively during operation? This thesis describes the challenges and related approaches in detail and points out that existing approaches do not fully address our research question. To tackle this gap, a formal specification of a runtime platform concept is presented in conjunction with a model-based engineering approach. The engineering approach decouples the engineering steps of component definition, integration, and deployment. The runtime platform supports this approach by isolating the components, while still offering predictable end-to-end real-time behavior. Independent evolution of software components is supported through a concept for synchronous reconfiguration during full operation, i.e., dynamic orchestration of components. Time-critical state transfer is supported, too, and can lead to bounded quality degradation, at most. The reconfiguration planning is supported by analysis concepts, including simulation of a formally specified system and reconfiguration, and analyzing potential quality degradation with the evolving dataflow graph (EDFG) method. A platform-specific realization of the concepts, the real-time container architecture, is described as a reference implementation. The model and the prototype are evaluated regarding their feasibility and applicability of the concepts by two case studies. The first case study is a minimalistic distributed control system used in different setups with different component variants and reconfiguration plans to compare the model and the prototype and to gather runtime statistics. The second case study is a smart factory showcase system with more challenging application components and interface technologies. The conclusion is that the concepts are feasible and applicable, even though the concepts and the prototype still need to be worked on in future -- for example, to reach shorter cycle times.Eine große Auswahl von Drittanbieter-Lösungen ist einer der Schlüsselfaktoren für Software Ecosystems: Nutzer profitieren vom breiten Angebot und schnellen Innovationen, während Drittanbieter über die Plattform erfolgreiche Lösungen anbieten können. Das jedoch setzt eine gewisse Entkopplung von Entwicklungsschritten der Beteiligten voraus, welche für verteilte Steuerungssysteme noch nicht erreicht wurde. Während Drittanbieter-Komponenten möglichst spät -- sogar Laufzeit -- integriert werden müssten, müssen Steuerungssysteme jedoch eine hohe Zuverlässigkeit gegenüber Echtzeitanforderungen aufweisen, was zu Integrationskomplexität führt. Dies zu lösen würde insbesondere zur Vision von Software-definierter Produktion beitragen, da ein Ecosystem für moderne IT-basierte Steuerungskomponenten wegen deren höherem Abstraktionsgrad und der Vielzahl verfügbarer Frameworks zu schnellerer Innovation führen würde. Daher behandelt diese Dissertation folgende Forschungsfrage: Wie können wir moderne IT-Technologien verwenden und unabhängige Entwicklung und einfache Integration von Software-Komponenten in verteilten Steuerungssystemen ermöglichen, wo Ende-zu-Ende-Echtzeitverhalten gefordert ist, und wie können wir insbesondere verteilte Änderungen an solchen Systemen konsistent und im Vollbetrieb vornehmen? Diese Dissertation beschreibt Herausforderungen und verwandte Ansätze im Detail und zeigt auf, dass existierende Ansätze diese Frage nicht vollständig behandeln. Um diese Lücke zu schließen, beschreiben wir eine formale Spezifikation einer Laufzeit-Plattform und einen zugehörigen Modell-basierten Engineering-Ansatz. Dieser Ansatz entkoppelt die Design-Schritte der Entwicklung, Integration und des Deployments von Komponenten. Die Laufzeit-Plattform unterstützt den Ansatz durch Isolation von Komponenten und zugleich Zeit-deterministischem Ende-zu-Ende-Verhalten. Unabhängige Entwicklung und Integration werden durch Konzepte für synchrone Rekonfiguration im Vollbetrieb unterstützt, also durch dynamische Orchestrierung. Dies beinhaltet auch Zeit-kritische Zustands-Transfers mit höchstens begrenzter Qualitätsminderung, wenn überhaupt. Rekonfigurationsplanung wird durch Analysekonzepte unterstützt, einschließlich der Simulation formal spezifizierter Systeme und Rekonfigurationen und der Analyse der etwaigen Qualitätsminderung mit dem Evolving Dataflow Graph (EDFG). Die Real-Time Container Architecture wird als Referenzimplementierung und Evaluationsplattform beschrieben. Zwei Fallstudien untersuchen Machbarkeit und Nützlichkeit der Konzepte. Die erste verwendet verschiedene Varianten und Rekonfigurationen eines minimalistischen verteilten Steuerungssystems, um Modell und Prototyp zu vergleichen sowie Laufzeitstatistiken zu erheben. Die zweite Fallstudie ist ein Smart-Factory-Demonstrator, welcher herausforderndere Applikationskomponenten und Schnittstellentechnologien verwendet. Die Konzepte sind den Studien nach machbar und nützlich, auch wenn sowohl die Konzepte als auch der Prototyp noch weitere Arbeit benötigen -- zum Beispiel, um kürzere Zyklen zu erreichen

    Resilient and Scalable Forwarding for Software-Defined Networks with P4-Programmable Switches

    Get PDF
    Traditional networking devices support only fixed features and limited configurability. Network softwarization leverages programmable software and hardware platforms to remove those limitations. In this context the concept of programmable data planes allows directly to program the packet processing pipeline of networking devices and create custom control plane algorithms. This flexibility enables the design of novel networking mechanisms where the status quo struggles to meet high demands of next-generation networks like 5G, Internet of Things, cloud computing, and industry 4.0. P4 is the most popular technology to implement programmable data planes. However, programmable data planes, and in particular, the P4 technology, emerged only recently. Thus, P4 support for some well-established networking concepts is still lacking and several issues remain unsolved due to the different characteristics of programmable data planes in comparison to traditional networking. The research of this thesis focuses on two open issues of programmable data planes. First, it develops resilient and efficient forwarding mechanisms for the P4 data plane as there are no satisfying state of the art best practices yet. Second, it enables BIER in high-performance P4 data planes. BIER is a novel, scalable, and efficient transport mechanism for IP multicast traffic which has only very limited support of high-performance forwarding platforms yet. The main results of this thesis are published as 8 peer-reviewed and one post-publication peer-reviewed publication. The results cover the development of suitable resilience mechanisms for P4 data planes, the development and implementation of resilient BIER forwarding in P4, and the extensive evaluations of all developed and implemented mechanisms. Furthermore, the results contain a comprehensive P4 literature study. Two more peer-reviewed papers contain additional content that is not directly related to the main results. They implement congestion avoidance mechanisms in P4 and develop a scheduling concept to find cost-optimized load schedules based on day-ahead forecasts

    Multi-tier GPU virtualization for deep learning in cloud-edge systems

    Get PDF
    Accelerator virtualization offers several advantages in the context of cloud-edge computing. Relatively weak user devices can enhance performance when running workloads by accessing virtualized accelerators available on other resources in the cloud-edge continuum. However, cloud-edge systems are heterogeneous, often leading to compatibility issues arising from various hardware and software stacks present in the system. One mechanism to alleviate this issue is using containers for deploying workloads. Containers isolate applications and their dependencies and store them as images that can run on any device. In addition, user devices may move during the course of application execution, and thus mechanisms such as container migration are required to move running workloads from one resource to another in the network. Furthermore, an optimal destination will need to be determined when migrating between virtual accelerators. Scheduling and placement strategies are incorporated to choose the best possible location depending on the workload requirements. This paper presents AVEC , a framework for accelerator virtualization in cloud-edge computing. The AVEC framework enables the offloading of deep learning workloads for inference from weak user devices to computationally more powerful devices in a cloud-edge network. AVEC incorporates a mechanism that efficiently manages and schedules the virtualization of accelerators. It also supports migration between accelerators to enable stateless container migration. The experimental analysis highlights that AVEC can achieve up to 7x speedup by offloading applications to remote resources. Furthermore, AVEC features a low migration downtime that is less than 5 seconds.PostprintPeer reviewe

    Availability Analysis of Redundant and Replicated Cloud Services with Bayesian Networks

    Full text link
    Due to the growing complexity of modern data centers, failures are not uncommon any more. Therefore, fault tolerance mechanisms play a vital role in fulfilling the availability requirements. Multiple availability models have been proposed to assess compute systems, among which Bayesian network models have gained popularity in industry and research due to its powerful modeling formalism. In particular, this work focuses on assessing the availability of redundant and replicated cloud computing services with Bayesian networks. So far, research on availability has only focused on modeling either infrastructure or communication failures in Bayesian networks, but have not considered both simultaneously. This work addresses practical modeling challenges of assessing the availability of large-scale redundant and replicated services with Bayesian networks, including cascading and common-cause failures from the surrounding infrastructure and communication network. In order to ease the modeling task, this paper introduces a high-level modeling formalism to build such a Bayesian network automatically. Performance evaluations demonstrate the feasibility of the presented Bayesian network approach to assess the availability of large-scale redundant and replicated services. This model is not only applicable in the domain of cloud computing it can also be applied for general cases of local and geo-distributed systems.Comment: 16 pages, 12 figures, journa

    Service Provisioning in Edge-Cloud Continuum Emerging Applications for Mobile Devices

    Get PDF
    Disruptive applications for mobile devices can be enhanced by Edge computing facilities. In this context, Edge Computing (EC) is a proposed architecture to meet the mobility requirements imposed by these applications in a wide range of domains, such as the Internet of Things, Immersive Media, and Connected and Autonomous Vehicles. EC architecture aims to introduce computing capabilities in the path between the user and the Cloud to execute tasks closer to where they are consumed, thus mitigating issues related to latency, context awareness, and mobility support. In this survey, we describe which are the leading technologies to support the deployment of EC infrastructure. Thereafter, we discuss the applications that can take advantage of EC and how they were proposed in the literature. Finally, after examining enabling technologies and related applications, we identify some open challenges to fully achieve the potential of EC, and also research opportunities on upcoming paradigms for service provisioning. This survey is a guide to comprehend the recent advances on the provisioning of mobile applications, as well as foresee the expected next stages of evolution for these applications

    BeeFlow: Behavior Tree-based Serverless Workflow Modeling and Scheduling for Resource-Constrained Edge Clusters

    Full text link
    Serverless computing has gained popularity in edge computing due to its flexible features, including the pay-per-use pricing model, auto-scaling capabilities, and multi-tenancy support. Complex Serverless-based applications typically rely on Serverless workflows (also known as Serverless function orchestration) to express task execution logic, and numerous application- and system-level optimization techniques have been developed for Serverless workflow scheduling. However, there has been limited exploration of optimizing Serverless workflow scheduling in edge computing systems, particularly in high-density, resource-constrained environments such as system-on-chip clusters and single-board-computer clusters. In this work, we discover that existing Serverless workflow scheduling techniques typically assume models with limited expressiveness and cause significant resource contention. To address these issues, we propose modeling Serverless workflows using behavior trees, a novel and fundamentally different approach from existing directed-acyclic-graph- and state machine-based models. Behavior tree-based modeling allows for easy analysis without compromising workflow expressiveness. We further present observations derived from the inherent tree structure of behavior trees for contention-free function collections and awareness of exact and empirical concurrent function invocations. Based on these observations, we introduce BeeFlow, a behavior tree-based Serverless workflow system tailored for resource-constrained edge clusters. Experimental results demonstrate that BeeFlow achieves up to 3.2X speedup in a high-density, resource-constrained edge testbed and 2.5X speedup in a high-profile cloud testbed, compared with the state-of-the-art.Comment: Accepted by Journal of Systems Architectur

    Co-designing reliability and performance for datacenter memory

    Get PDF
    Memory is one of the key components that affects reliability and performance of datacenter servers. Memory in today’s servers is organized and shared in several ways to provide the most performant and efficient access to data. For example, cache hierarchy in multi-core chips to reduce access latency, non-uniform memory access (NUMA) in multi-socket servers to improve scalability, disaggregation to increase memory capacity. In all these organizations, hardware coherence protocols are used to maintain memory consistency of this shared memory and implicitly move data to the requesting cores. This thesis aims to provide fault-tolerance against newer models of failure in the organization of memory in datacenter servers. While designing for improved reliability, this thesis explores solutions that can also enhance performance of applications. The solutions build over modern coherence protocols to achieve these properties. First, we observe that DRAM memory system failure rates have increased, demanding stronger forms of memory reliability. To combat this, the thesis proposes Dvé, a hardware driven replication mechanism where data blocks are replicated across two different memory controllers in a cache-coherent NUMA system. Data blocks are accompanied by a code with strong error detection capabilities so that when an error is detected, correction is performed using the replica. Dvé’s organization offers two independent points of access to data which enables: (a) strong error correction that can recover from a range of faults affecting any of the components in the memory and (b) higher performance by providing another nearer point of memory access. Dvé’s coherent replication keeps the replicas in sync for reliability and also provides coherent access to read replicas during fault-free operation for improved performance. Dvé can flexibly provide these benefits on-demand at runtime. Next, we observe that the coherence protocol itself requires to be hardened against failures. Memory in datacenter servers is being disaggregated from the compute servers into dedicated memory servers, driven by standards like CXL. CXL specifies the coherence protocol semantics for compute servers to access and cache data from a shared region in the disaggregated memory. However, the CXL specification lacks the requisite level of fault-tolerance necessary to operate at an inter-server scale within the datacenter. Compute servers can fail or be unresponsive in the datacenter and therefore, it is important that the coherence protocol remain available in the presence of such failures. The thesis proposes Āpta, a CXL-based, shared disaggregated memory system for keeping the cached data consistent without compromising availability in the face of compute server failures. Āpta architects a high-performance fault-tolerant object-granular memory server that significantly improves performance for stateless function-as-a-service (FaaS) datacenter applications

    Uma arquitetura de alta disponibilidade para funções e serviços virtualizados de rede

    Get PDF
    Orientador: Elias P. Duarte Jr.Tese (doutorado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa : Curitiba, 10/02/2023Inclui referênciasÁrea de concentração: Ciência da ComputaçãoResumo: A virtualizacao vem revolucionando a forma como as redes sao construidas e gerenciadas, permitindo a sua evolucao para multiplas direcoes. A Virtualizacao de Funcoes de Rede (Network Function Virtualization - NFV) pode gerar mudancas significativas na rede, uma vez que funcoes de rede tradicionalmente implementadas em hardware dedicado podem ser substituidas por software, denominadas de Funcoes Virtualizadas de Rede (Virtualized Network Functions - VNFs). Apesar das vantagens, alguns desafios ainda devem ser explorados para permitir a sua ampla adocao. As contribuicoes propostas nesta Tese de Doutorado sao divididas em tres partes. A primeira parte explora o fato de que servicos virtualizados possuem maior susceptibilidade a falhas do que as suas alternativas em hardware dedicado. Tendo em vista que as redes se tornaram extremamente necessarias, e essencial garantir a execucao correta e continua dos servicos. A primeira contribuicao propoe uma arquitetura NFV de alta disponibilidade para servicos de rede. A arquitetura realiza o gerenciamento de falhas, oferecendo multiplas estrategias de recuperacao, alem de preservar o estado das VNFs atraves de tecnicas de Checkpoint/Restore. Um prototipo da arquitetura foi implementado e resultados experimentais mostram que e possivel atingir niveis de disponibilidade similares aos de sistemas comerciais de telecomunicacoes. A segunda parte desta Tese analisa a susceptibilidade a falhas de servicos sob outra perspectiva. Considerando que tais servicos sao geralmente executados sobre uma arquitetura subjacente, a falha em algum componente desta arquitetura pode afetar toda a infraestrutura, impedindo o acesso ou possibilitando o uso nao autorizado do sistema. A segunda contribuicao propoe a FIT-SFC (Fault- & Intrusion- Tolerant SFC): uma arquitetura para suportar servicos virtuais seguros e altamente disponiveis. A FIT-SFC e baseada em tecnicas de replicacao para tolerar falhas por parada, por omissao ou intrusao em qualquer de seus componentes. Um prototipo da arquitetura foi implementado e resultados sao apresentados para os custos para tolerar as falhas. Por fim, a terceira parte desta Tese investiga a possibilidade de utilizar NFV para permitir a execucao de servicos virtualizados dentro da propria rede. No conceito chamado COIN (COmputing In the Network), aplicacoes normalmente executadas pelos proprios usuarios finais passam a ser inteiramente executados nativamente dentro da rede. A terceira contribuicao explora a sinergia entre NFV e COIN, denominada de NFV-COIN. Uma arquitetura e proposta para permitir a oferta e gerenciamento de servicos NFV-COIN, alem de oferecer uma interface alto nivel que permite a manipulacao padronizada dos servicos de rede. Experimentos executados em um prototipo implementado da arquitetura demonstram a possibilidade de oferecer e executar servicos NFV-COIN sem introduzir perdas significativas de desempenho.Abstract: Virtualization has represented a true revolution in the way networks are built and managed, allowing them to evolve along multiple directions. In particular, Network Function Virtualization (NFV) has been causing deep changes, since network functions traditionally implemented as specialized hardware can be replaced by software, called Virtualized Network Functions (VNFs). Despite their many advantages, some challenges still need to be addressed in order to allow its wide adoption. The contributions proposed in this Doctoral Thesis are divided into three parts. The first part explores the fact that virtualized services are more prone to failures than traditional alternatives available as specialized hardware. As networks have become extremely necessary, it is essential to ensure the correct and continuous operation of services. The first contribution proposes a high availability architecture for NFV-based services. The architecture performs fault management, offers multiple recovery strategies, while also preserving the state of VNFs through Checkpoint/Restore techniques. A prototype was implemented and experimental results show that it is possible to reach carrier-grade availability. The second part of this Thesis analyzes the susceptibility of service failures from another perspective. Considering that virtualized services usually run on an underlying architecture, the failure of any component of this architecture could affect the entire infrastructure, restraining access or allowing unauthorized use of the system. Therefore, the second contribution proposes the FIT-SFC (Fault- & Intrusion- Tolerant SFC): an architecture to support secure and highly available virtual services. The FIT-SFC architecture is based on replication techniques to tolerate crash, omission, or intrusion failures in any of its components. A prototype was implemented and results are presented for the costs to tolerate failures. Finally, the third part of this Thesis investigates the possibility of using NFV to allow the execution of virtualized services within the network. In the context of COIN (COmputing In the Network), applications that are usually executed by the end users themselves, can now be entirely executed natively within the network. The third contribution explores the synergy between NFV and COIN, called NFV-COIN. An architecture is proposed to allow the deployment and management of NFV-COIN services, while also offering a high-level interface that allows standardized operation of network services. Experiments were executed on an implemented prototype and results demonstrate the possibility of deploying and executing NFV-COIN services without introducing significant performance losses
    corecore