1,401 research outputs found

    ENHANCING CLOUD SYSTEM RUNTIME TO ADDRESS COMPLEX FAILURES

    Get PDF
    As the reliance on cloud systems intensifies in our progressively digital world, understanding and reinforcing their reliability becomes more crucial than ever. Despite impressive advancements in augmenting the resilience of cloud systems, the growing incidence of complex failures now poses a substantial challenge to the availability of these systems. With cloud systems continuing to scale and increase in complexity, failures not only become more elusive to detect but can also lead to more catastrophic consequences. Such failures question the foundational premises of conventional fault-tolerance designs, necessitating the creation of novel system designs to counteract them. This dissertation aims to enhance distributed systems’ capabilities to detect, localize, and react to complex failures at runtime. To this end, this dissertation makes contributions to address three emerging categories of failures in cloud systems. The first part delves into the investigation of partial failures, introducing OmegaGen, a tool adept at generating tailored checkers for detecting and localizing such failures. The second part grapples with silent semantic failures prevalent in cloud systems, showcasing our study findings, and introducing Oathkeeper, a tool that leverages past failures to infer rules and expose these silent issues. The third part explores solutions to slow failures via RESIN, a framework specifically designed to detect, diagnose, and mitigate memory leaks in cloud-scale infrastructures, developed in collaboration with Microsoft Azure. The dissertation concludes by offering insights into future directions for the construction of reliable cloud systems

    Enabling HW-based task scheduling in large multicore architectures

    Get PDF
    Dynamic Task Scheduling is an enticing programming model aiming to ease the development of parallel programs with intrinsically irregular or data-dependent parallelism. The performance of such solutions relies on the ability of the Task Scheduling HW/SW stack to efficiently evaluate dependencies at runtime and schedule work to available cores. Traditional SW-only systems implicate scheduling overheads of around 30K processor cycles per task, which severely limit the ( core count , task granularity ) combinations that they might adequately handle. Previous work on HW-accelerated Task Scheduling has shown that such systems might support high performance scheduling on processors with up to eight cores, but questions remained regarding the viability of such solutions to support the greater number of cores now frequently found in high-end SMP systems. The present work presents an FPGA-proven, tightly-integrated, Linux-capable, 30-core RISC-V system with hardware accelerated Task Scheduling. We use this implementation to show that HW Task Scheduling can still offer competitive performance at such high core count, and describe how this organization includes hardware and software optimizations that make it even more scalable than previous solutions. Finally, we outline ways in which this architecture could be augmented to overcome inter-core communication bottlenecks, mitigating the cache-degradation effects usually involved in the parallelization of highly optimized serial code.This work is supported by the TEXTAROSSA project G.A. n.956831, as part of the EuroHPC initiative, by the Spanish Government (grants PCI2021-121964, TEXTAROSSA; PDC2022-133323-I00, Multi-Ka; PID2019-107255GB-C21 MCIN/AEI/10.13039/501100011033; and CEX2021-001148-S), by Generalitat de Catalunya (2021 SGR 01007), and FAPESP (grant 2019/26702-8).Peer ReviewedPostprint (published version

    Tools for efficient Deep Learning

    Get PDF
    In the era of Deep Learning (DL), there is a fast-growing demand for building and deploying Deep Neural Networks (DNNs) on various platforms. This thesis proposes five tools to address the challenges for designing DNNs that are efficient in time, in resources and in power consumption. We first present Aegis and SPGC to address the challenges in improving the memory efficiency of DL training and inference. Aegis makes mixed precision training (MPT) stabler by layer-wise gradient scaling. Empirical experiments show that Aegis can improve MPT accuracy by at most 4\%. SPGC focuses on structured pruning: replacing standard convolution with group convolution (GConv) to avoid irregular sparsity. SPGC formulates GConv pruning as a channel permutation problem and proposes a novel heuristic polynomial-time algorithm. Common DNNs pruned by SPGC have maximally 1\% higher accuracy than prior work. This thesis also addresses the challenges lying in the gap between DNN descriptions and executables by Polygeist for software and POLSCA for hardware. Many novel techniques, e.g. statement splitting and memory partitioning, are explored and used to expand polyhedral optimisation. Polygeist can speed up software execution in sequential and parallel by 2.53 and 9.47 times on Polybench/C. POLSCA achieves 1.5 times speedup over hardware designs directly generated from high-level synthesis on Polybench/C. Moreover, this thesis presents Deacon, a framework that generates FPGA-based DNN accelerators of streaming architectures with advanced pipelining techniques to address the challenges from heterogeneous convolution and residual connections. Deacon provides fine-grained pipelining, graph-level optimisation, and heuristic exploration by graph colouring. Compared with prior designs, Deacon shows resource/power consumption efficiency improvement of 1.2x/3.5x for MobileNets and 1.0x/2.8x for SqueezeNets. All these tools are open source, some of which have already gained public engagement. We believe they can make efficient deep learning applications easier to build and deploy.Open Acces

    Northeastern Illinois University, Academic Catalog 2023-2024

    Get PDF
    https://neiudc.neiu.edu/catalogs/1064/thumbnail.jp

    Intégration des méthodes formelles dans le développement des RCSFs

    Get PDF
    In this thesis, we have relied on formal techniques in order to first evaluate WSN protocols and then to propose solutions that meet the requirements of these networks. The thesis contributes to the modelling, analysis, design and evaluation of WSN protocols. In this context, the thesis begins with a survey on WSN and formal verification techniques. Focusing on the MAC layer, the thesis reviews proposed MAC protocols for WSN as well as their design challenges. The dissertation then proceeds to outline the contributions of this work. As a first proposal, we develop a stochastic generic model of the 802.11 MAC protocol for an arbitrary network topology and then perform probabilistic evaluation of the protocol using statistical model checking. Considering an alternative power source to operate WSN, energy harvesting, we move to the second proposal where a protocol designed for EH-WSN is modelled and various performance parameters are evaluated. Finally, the thesis explores mobility in WSN and proposes a new MAC protocol, named "Mobility and Energy Harvesting aware Medium Access Control (MEH-MAC)" protocol for dynamic sensor networks powered by ambient energy. The protocol is modelled and verified under several features

    Flip: Data-Centric Edge CGRA Accelerator

    Full text link
    Coarse-Grained Reconfigurable Arrays (CGRA) are promising edge accelerators due to the outstanding balance in flexibility, performance, and energy efficiency. Classic CGRAs statically map compute operations onto the processing elements (PE) and route the data dependencies among the operations through the Network-on-Chip. However, CGRAs are designed for fine-grained static instruction-level parallelism and struggle to accelerate applications with dynamic and irregular data-level parallelism, such as graph processing. To address this limitation, we present Flip, a novel accelerator that enhances traditional CGRA architectures to boost the performance of graph applications. Flip retains the classic CGRA execution model while introducing a special data-centric mode for efficient graph processing. Specifically, it exploits the natural data parallelism of graph algorithms by mapping graph vertices onto processing elements (PEs) rather than the operations, and supporting dynamic routing of temporary data according to the runtime evolution of the graph frontier. Experimental results demonstrate that Flip achieves up to 36×\times speedup with merely 19% more area compared to classic CGRAs. Compared to state-of-the-art large-scale graph processors, Flip has similar energy efficiency and 2.2×\times better area efficiency at a much-reduced power/area budget

    Toward Fault-Tolerant Applications on Reconfigurable Systems-on-Chip

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Multi-Agent Modelling of Industrial Cyber-Physical Systems for IEC 61499 Based Distributed Intelligent Automation

    Get PDF
    Traditional industrial automation systems developed under IEC 61131-3 in centralized architectures are statically programmed with determined procedures to perform predefined tasks in structured environments. Major challenges are that these systems designed under traditional engineering techniques and running on legacy automation platforms are unable to automatically discover alternative solutions, flexibly coordinate reconfigurable modules, and actively deploy corresponding functions, to quickly respond to frequent changes and intelligently adapt to evolving requirements in dynamic environments. The core objective of this research is to explore the design of multi-layer automation architectures to enable real-time adaptation at the device level and run-time intelligence throughout the whole system under a well-integrated modelling framework. Central to this goal is the research on the integration of multi-agent modelling and IEC 61499 function block modelling to form a new automation infrastructure for industrial cyber-physical systems. Multi-agent modelling uses autonomous and cooperative agents to achieve run-time intelligence in system design and module reconfiguration. IEC 61499 function block modelling applies object-oriented and event-driven function blocks to realize real-time adaption of automation logic and control algorithms. In this thesis, the design focuses on a two-layer self-manageable architecture modelling: a) the high-level cyber module designed as multi-agent computing model consisting of Monitoring Agent, Analysis Agent, Self-Learning Agent, Planning Agent, Execution Agent, and Knowledge Agent; and b) the low-level physical module designed as agent-embedded IEC 61499 function block model with Self-Manageable Service Execution Agent, Self-Configuration Agent, Self-Healing Agent, Self-Optimization Agent, and Self-Protection Agent. The design results in a new computing module for high-level multi-agent based automation architectures and a new design pattern for low-level function block modelled control solutions. The architecture modelling framework is demonstrated through various tests on the multi-agent simulation model developed in the agent modelling environment NetLogo and the experimental testbed designed on the Jetson Nano and Raspberry Pi platforms. The performance evaluation of regular execution time and adaptation time in two typical conditions for systems designed under three different architectures are also analyzed. The results demonstrate the ability of the proposed architecture to respond to major challenges in Industry 4.0

    Lessons from Formally Verified Deployed Software Systems (Extended version)

    Full text link
    The technology of formal software verification has made spectacular advances, but how much does it actually benefit the development of practical software? Considerable disagreement remains about the practicality of building systems with mechanically-checked proofs of correctness. Is this prospect confined to a few expensive, life-critical projects, or can the idea be applied to a wide segment of the software industry? To help answer this question, the present survey examines a range of projects, in various application areas, that have produced formally verified systems and deployed them for actual use. It considers the technologies used, the form of verification applied, the results obtained, and the lessons that can be drawn for the software industry at large and its ability to benefit from formal verification techniques and tools. Note: a short version of this paper is also available, covering in detail only a subset of the considered systems. The present version is intended for full reference.Comment: arXiv admin note: text overlap with arXiv:1211.6186 by other author

    High Level Concurrency in C∀

    Get PDF
    Concurrent programs are notoriously hard to write and even harder to debug. Furthermore concurrent programs must be performant, as the introduction of concurrency into a program is often done to achieve some form of speedup. This thesis presents a suite of high-level concurrent-language features in the new programming language C∀, all of which are implemented with the aim of improving the performance, productivity, and safety of concurrent programs. C∀ is a non-object-oriented programming language that extends C. The foundation for concurrency in C∀ was laid by Thierry Delisle [15], who implemented coroutines, user-level threads, and monitors. This thesis builds upon that work and introduces a suite of new concurrent features as its main contribution. The features include a mutex statement (similar to a C++ scoped lock or Java synchronized statement), Go-like channels and select statement, and an actor system. The root ideas behind these features are not new, but the C∀ implementations extends the original ideas in performance, productivity, and safety
    corecore