71,256 research outputs found
A Pattern Language for High-Performance Computing Resilience
High-performance computing systems (HPC) provide powerful capabilities for
modeling, simulation, and data analytics for a broad class of computational
problems. They enable extreme performance of the order of quadrillion
floating-point arithmetic calculations per second by aggregating the power of
millions of compute, memory, networking and storage components. With the
rapidly growing scale and complexity of HPC systems for achieving even greater
performance, ensuring their reliable operation in the face of system
degradations and failures is a critical challenge. System fault events often
lead the scientific applications to produce incorrect results, or may even
cause their untimely termination. The sheer number of components in modern
extreme-scale HPC systems and the complex interactions and dependencies among
the hardware and software components, the applications, and the physical
environment makes the design of practical solutions that support fault
resilience a complex undertaking. To manage this complexity, we developed a
methodology for designing HPC resilience solutions using design patterns. We
codified the well-known techniques for handling faults, errors and failures
that have been devised, applied and improved upon over the past three decades
in the form of design patterns. In this paper, we present a pattern language to
enable a structured approach to the development of HPC resilience solutions.
The pattern language reveals the relations among the resilience patterns and
provides the means to explore alternative techniques for handling a specific
fault model that may have different efficiency and complexity characteristics.
Using the pattern language enables the design and implementation of
comprehensive resilience solutions as a set of interconnected resilience
patterns that can be instantiated across layers of the system stack.Comment: Proceedings of the 22nd European Conference on Pattern Languages of
Program
Management and Service-aware Networking Architectures (MANA) for Future Internet Position Paper: System Functions, Capabilities and Requirements
Future Internet (FI) research and development threads have recently been gaining momentum all over the world and as such the international race to create a new generation Internet is in full swing: GENI, Asia Future Internet, Future Internet Forum Korea, European Union Future Internet Assembly (FIA). This is a position paper identifying the research orientation with a time horizon of 10 years, together with the key challenges for the capabilities in the Management and Service-aware Networking Architectures (MANA) part of the Future Internet (FI) allowing for parallel and federated Internet(s)
Report from GI-Dagstuhl Seminar 16394: Software Performance Engineering in the DevOps World
This report documents the program and the outcomes of GI-Dagstuhl Seminar
16394 "Software Performance Engineering in the DevOps World".
The seminar addressed the problem of performance-aware DevOps. Both, DevOps
and performance engineering have been growing trends over the past one to two
years, in no small part due to the rise in importance of identifying
performance anomalies in the operations (Ops) of cloud and big data systems and
feeding these back to the development (Dev). However, so far, the research
community has treated software engineering, performance engineering, and cloud
computing mostly as individual research areas. We aimed to identify
cross-community collaboration, and to set the path for long-lasting
collaborations towards performance-aware DevOps.
The main goal of the seminar was to bring together young researchers (PhD
students in a later stage of their PhD, as well as PostDocs or Junior
Professors) in the areas of (i) software engineering, (ii) performance
engineering, and (iii) cloud computing and big data to present their current
research projects, to exchange experience and expertise, to discuss research
challenges, and to develop ideas for future collaborations
Microservice Transition and its Granularity Problem: A Systematic Mapping Study
Microservices have gained wide recognition and acceptance in software
industries as an emerging architectural style for autonomic, scalable, and more
reliable computing. The transition to microservices has been highly motivated
by the need for better alignment of technical design decisions with improving
value potentials of architectures. Despite microservices' popularity, research
still lacks disciplined understanding of transition and consensus on the
principles and activities underlying "micro-ing" architectures. In this paper,
we report on a systematic mapping study that consolidates various views,
approaches and activities that commonly assist in the transition to
microservices. The study aims to provide a better understanding of the
transition; it also contributes a working definition of the transition and
technical activities underlying it. We term the transition and technical
activities leading to microservice architectures as microservitization. We then
shed light on a fundamental problem of microservitization: microservice
granularity and reasoning about its adaptation as first-class entities. This
study reviews state-of-the-art and -practice related to reasoning about
microservice granularity; it reviews modelling approaches, aspects considered,
guidelines and processes used to reason about microservice granularity. This
study identifies opportunities for future research and development related to
reasoning about microservice granularity.Comment: 36 pages including references, 6 figures, and 3 table
- âŠ