Search CORE

151,913 research outputs found

May 9th, 2017

Author: Hukerikar Saurabh
Publication venue: Barcelona Supercomputing Center
Publication date: 10/09/2017
Field of study

Reliability is a serious concern for future extreme-scale high-performance computing (HPC) systems. Projections based on the current generation of HPC systems and technology roadmaps suggest the prevalence of very high fault rates in future systems. While the HPC community has developed various resilience solutions, application-level techniques as well as system-based solutions, the solution space of HPC resilience techniques remains fragmented. There are no formal methods and metrics to investigate and evaluate resilience holistically in HPC systems that consider impact scope, handling coverage, and performance & power efficiency. Few of the current approaches are portable to newer architectures and software environments that will be deployed on future systems. In this talk, I will present a structured approach to the management of HPC resilience using the concept of resilience-based design patterns. A design pattern is a general repeatable solution to a commonly occurring problem. We identify the commonly occurring problems and solutions used to deal with faults, errors and failures in HPC systems. Each established solution is described in the form of a pattern that addresses concrete problems. We have developed a complete catalog of resilience design patterns, which provides designers with a collection of such reusable design elements. We have also defined a framework that enhances a designer's understanding of the important constraints and opportunities for the design patterns to be implemented and deployed at various layers of the system stack. This design framework may be used to establish mechanisms and interfaces to coordinate flexible fault management across hardware and software components. The framework also enables optimization of the cost-benefit trade-offs among performance, resilience, and power consumption. The overall goal of this work is to enable a systematic methodology for the design and evaluation of resilience technologies in extreme-scale HPC systems

UPCommons. Portal del coneixement obert de la UPC

A Pattern Language for High-Performance Computing Resilience

Author: Chung Jinsuk
Mohror Kathryn
Saridakis Titos
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/10/2017
Field of study

High-performance computing systems (HPC) provide powerful capabilities for modeling, simulation, and data analytics for a broad class of computational problems. They enable extreme performance of the order of quadrillion floating-point arithmetic calculations per second by aggregating the power of millions of compute, memory, networking and storage components. With the rapidly growing scale and complexity of HPC systems for achieving even greater performance, ensuring their reliable operation in the face of system degradations and failures is a critical challenge. System fault events often lead the scientific applications to produce incorrect results, or may even cause their untimely termination. The sheer number of components in modern extreme-scale HPC systems and the complex interactions and dependencies among the hardware and software components, the applications, and the physical environment makes the design of practical solutions that support fault resilience a complex undertaking. To manage this complexity, we developed a methodology for designing HPC resilience solutions using design patterns. We codified the well-known techniques for handling faults, errors and failures that have been devised, applied and improved upon over the past three decades in the form of design patterns. In this paper, we present a pattern language to enable a structured approach to the development of HPC resilience solutions. The pattern language reveals the relations among the resilience patterns and provides the means to explore alternative techniques for handling a specific fault model that may have different efficiency and complexity characteristics. Using the pattern language enables the design and implementation of comprehensive resilience solutions as a set of interconnected resilience patterns that can be instantiated across layers of the system stack.Comment: Proceedings of the 22nd European Conference on Pattern Languages of Program

arXiv.org e-Print Archive

Crossref

TLAD 2010 Proceedings:8th international workshop on teaching, learning and assesment of databases (TLAD)

Author
Publication venue
Publication date: 01/01/2010
Field of study

This is the eighth in the series of highly successful international workshops on the Teaching, Learning and Assessment of Databases (TLAD 2010), which once again is held as a workshop of BNCOD 2010 - the 27th International Information Systems Conference. TLAD 2010 is held on the 28th June at the beautiful Dudhope Castle at the Abertay University, just before BNCOD, and hopes to be just as successful as its predecessors.The teaching of databases is central to all Computing Science, Software Engineering, Information Systems and Information Technology courses, and this year, the workshop aims to continue the tradition of bringing together both database teachers and researchers, in order to share good learning, teaching and assessment practice and experience, and further the growing community amongst database academics. As well as attracting academics from the UK community, the workshop has also been successful in attracting academics from the wider international community, through serving on the programme committee, and attending and presenting papers.This year, the workshop includes an invited talk given by Richard Cooper (of the University of Glasgow) who will present a discussion and some results from the Database Disciplinary Commons which was held in the UK over the academic year. Due to the healthy number of high quality submissions this year, the workshop will also present seven peer reviewed papers, and six refereed poster papers. Of the seven presented papers, three will be presented as full papers and four as short papers. These papers and posters cover a number of themes, including: approaches to teaching databases, e.g. group centered and problem based learning; use of novel case studies, e.g. forensics and XML data; techniques and approaches for improving teaching and student learning processes; assessment techniques, e.g. peer review; methods for improving students abilities to develop database queries and develop E-R diagrams; and e-learning platforms for supporting teaching and learning

Abertay Research Portal

Model Based Development of Quality-Aware Software Services

Author: Fernández Briones Javier
Massonet Philippe
Miguel Cabello Miguel Angel de
Silva Gallino Juan Pedro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Modelling languages and development frameworks give support for functional and structural description of software architectures. But quality-aware applications require languages which allow expressing QoS as a first-class concept during architecture design and service composition, and to extend existing tools and infrastructures adding support for modelling, evaluating, managing and monitoring QoS aspects. In addition to its functional behaviour and internal structure, the developer of each service must consider the fulfilment of its quality requirements. If the service is flexible, the output quality depends both on input quality and available resources (e.g., amounts of CPU execution time and memory). From the software engineering point of view, modelling of quality-aware requirements and architectures require modelling support for the description of quality concepts, support for the analysis of quality properties (e.g. model checking and consistencies of quality constraints, assembly of quality), tool support for the transition from quality requirements to quality-aware architectures, and from quality-aware architecture to service run-time infrastructures. Quality management in run-time service infrastructures must give support for handling quality concepts dynamically. QoS-aware modeling frameworks and QoS-aware runtime management infrastructures require a common evolution to get their integration

Crossref

Archivo Digital UPM

TLAD 2010 Proceedings:8th international workshop on teaching, learning and assesment of databases (TLAD)

Author
Publication venue
Publication date: 01/01/2010
Field of study

Abertay Research Portal

Cell degradation detection based on an inter-cell approach

Author: Asghar Muhammad Zeeshan
Hämäläinen Seppo
Hämäläinen Timo
Imran Muhammad Ali
Niemine Paavo
Ristanie Tapani
Publication venue: Advanced Institute of Convergence Information Technology Research Center
Publication date: 01/03/2017
Field of study

Fault management is a crucial part of cellular network management systems. The status of the base stations is usually monitored by well-defined key performance indicators (KPIs). The approaches for cell degradation detection are based on either intra-cell or inter-cell analysis of the KPIs. In intra-cell analysis, KPI profiles are built based on their local history data whereas in inter-cell analysis, KPIs of one cell are compared with the corresponding KPIs of the other cells. In this work, we argue in favor of the inter-cell approach and apply a degradation detection method that is able to detect a sleeping cell that could be difficult to observe using traditional intra-cell methods. We demonstrate its use for detecting emulated degradations among performance data recorded from a live LTE network. The method can be integrated in current systems because it can operate using existing KPIs without any major modification to the network infrastructure

Jyväskylä University Digital Archive

Enlighten

KARL: A Knowledge-Assisted Retrieval Language

Author: Dominick Wayne D.
Triantafyllopoulos Spiros
Publication venue
Publication date
Field of study

Data classification and storage are tasks typically performed by application specialists. In contrast, information users are primarily non-computer specialists who use information in their decision-making and other activities. Interaction efficiency between such users and the computer is often reduced by machine requirements and resulting user reluctance to use the system. This thesis examines the problems associated with information retrieval for non-computer specialist users, and proposes a method for communicating in restricted English that uses knowledge of the entities involved, relationships between entities, and basic English language syntax and semantics to translate the user requests into formal queries. The proposed method includes an intelligent dictionary, syntax and semantic verifiers, and a formal query generator. In addition, the proposed system has a learning capability that can improve portability and performance. With the increasing demand for efficient human-machine communication, the significance of this thesis becomes apparent. As human resources become more valuable, software systems that will assist in improving the human-machine interface will be needed and research addressing new solutions will be of utmost importance. This thesis presents an initial design and implementation as a foundation for further research and development into the emerging field of natural language database query systems

NASA Technical Reports Server