71,256 research outputs found

    A Pattern Language for High-Performance Computing Resilience

    Full text link
    High-performance computing systems (HPC) provide powerful capabilities for modeling, simulation, and data analytics for a broad class of computational problems. They enable extreme performance of the order of quadrillion floating-point arithmetic calculations per second by aggregating the power of millions of compute, memory, networking and storage components. With the rapidly growing scale and complexity of HPC systems for achieving even greater performance, ensuring their reliable operation in the face of system degradations and failures is a critical challenge. System fault events often lead the scientific applications to produce incorrect results, or may even cause their untimely termination. The sheer number of components in modern extreme-scale HPC systems and the complex interactions and dependencies among the hardware and software components, the applications, and the physical environment makes the design of practical solutions that support fault resilience a complex undertaking. To manage this complexity, we developed a methodology for designing HPC resilience solutions using design patterns. We codified the well-known techniques for handling faults, errors and failures that have been devised, applied and improved upon over the past three decades in the form of design patterns. In this paper, we present a pattern language to enable a structured approach to the development of HPC resilience solutions. The pattern language reveals the relations among the resilience patterns and provides the means to explore alternative techniques for handling a specific fault model that may have different efficiency and complexity characteristics. Using the pattern language enables the design and implementation of comprehensive resilience solutions as a set of interconnected resilience patterns that can be instantiated across layers of the system stack.Comment: Proceedings of the 22nd European Conference on Pattern Languages of Program

    Management and Service-aware Networking Architectures (MANA) for Future Internet Position Paper: System Functions, Capabilities and Requirements

    Get PDF
    Future Internet (FI) research and development threads have recently been gaining momentum all over the world and as such the international race to create a new generation Internet is in full swing: GENI, Asia Future Internet, Future Internet Forum Korea, European Union Future Internet Assembly (FIA). This is a position paper identifying the research orientation with a time horizon of 10 years, together with the key challenges for the capabilities in the Management and Service-aware Networking Architectures (MANA) part of the Future Internet (FI) allowing for parallel and federated Internet(s)

    Report from GI-Dagstuhl Seminar 16394: Software Performance Engineering in the DevOps World

    Get PDF
    This report documents the program and the outcomes of GI-Dagstuhl Seminar 16394 "Software Performance Engineering in the DevOps World". The seminar addressed the problem of performance-aware DevOps. Both, DevOps and performance engineering have been growing trends over the past one to two years, in no small part due to the rise in importance of identifying performance anomalies in the operations (Ops) of cloud and big data systems and feeding these back to the development (Dev). However, so far, the research community has treated software engineering, performance engineering, and cloud computing mostly as individual research areas. We aimed to identify cross-community collaboration, and to set the path for long-lasting collaborations towards performance-aware DevOps. The main goal of the seminar was to bring together young researchers (PhD students in a later stage of their PhD, as well as PostDocs or Junior Professors) in the areas of (i) software engineering, (ii) performance engineering, and (iii) cloud computing and big data to present their current research projects, to exchange experience and expertise, to discuss research challenges, and to develop ideas for future collaborations

    Microservice Transition and its Granularity Problem: A Systematic Mapping Study

    Get PDF
    Microservices have gained wide recognition and acceptance in software industries as an emerging architectural style for autonomic, scalable, and more reliable computing. The transition to microservices has been highly motivated by the need for better alignment of technical design decisions with improving value potentials of architectures. Despite microservices' popularity, research still lacks disciplined understanding of transition and consensus on the principles and activities underlying "micro-ing" architectures. In this paper, we report on a systematic mapping study that consolidates various views, approaches and activities that commonly assist in the transition to microservices. The study aims to provide a better understanding of the transition; it also contributes a working definition of the transition and technical activities underlying it. We term the transition and technical activities leading to microservice architectures as microservitization. We then shed light on a fundamental problem of microservitization: microservice granularity and reasoning about its adaptation as first-class entities. This study reviews state-of-the-art and -practice related to reasoning about microservice granularity; it reviews modelling approaches, aspects considered, guidelines and processes used to reason about microservice granularity. This study identifies opportunities for future research and development related to reasoning about microservice granularity.Comment: 36 pages including references, 6 figures, and 3 table
    • 

    corecore