Search CORE

242 research outputs found

Spatial support vector regression to detect silent errors in the exascale era

Author: Balaprakash Prasanna
Bautista Gomez Leonardo
Cappello Franck
Cristal Kestelman Adrián
Di Sheng
Labarta Mancho Jesús José
Subasi Omer
Unsal Osman Sabri
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

As the exascale era approaches, the increasing capacity of high-performance computing (HPC) systems with targeted power and energy budget goals introduces significant challenges in reliability. Silent data corruptions (SDCs) or silent errors are one of the major sources that corrupt the executionresults of HPC applications without being detected. In this work, we explore a low-memory-overhead SDC detector, by leveraging epsilon-insensitive support vector machine regression, to detect SDCs that occur in HPC applications that can be characterized by an impact error bound. The key contributions are three fold. (1) Our design takes spatialfeatures (i.e., neighbouring data values for each data point in a snapshot) into training data, such that little memory overhead (less than 1%) is introduced. (2) We provide an in-depth study on the detection ability and performance with different parameters, and we optimize the detection range carefully. (3) Experiments with eight real-world HPC applications show thatour detector can achieve the detection sensitivity (i.e., recall) up to 99% yet suffer a less than 1% of false positive rate for most cases. Our detector incurs low performance overhead, 5% on average, for all benchmarks studied in the paper. Compared with other state-of-the-art techniques, our detector exhibits the best tradeoff considering the detection ability and overheads.This work was supported by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research Program, under Contract DE-AC02-06CH11357, by FI-DGR 2013 scholarship, by HiPEAC PhD Collaboration Grant, the European Community’s Seventh Framework Programme [FP7/2007-2013] under the Mont-blanc 2 Project (www.montblanc-project.eu), grant agreement no. 610402, and TIN2015-65316-P.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Toward bio-inspired information processing with networks of nano-scale switching elements

Author: Konkoli Zoran
Wendin Göran
Publication venue
Publication date: 25/11/2013
Field of study

Unconventional computing explores multi-scale platforms connecting molecular-scale devices into networks for the development of scalable neuromorphic architectures, often based on new materials and components with new functionalities. We review some work investigating the functionalities of locally connected networks of different types of switching elements as computational substrates. In particular, we discuss reservoir computing with networks of nonlinear nanoscale components. In usual neuromorphic paradigms, the network synaptic weights are adjusted as a result of a training/learning process. In reservoir computing, the non-linear network acts as a dynamical system mixing and spreading the input signals over a large state space, and only a readout layer is trained. We illustrate the most important concepts with a few examples, featuring memristor networks with time-dependent and history dependent resistances

arXiv.org e-Print Archive

Chalmers Research

ASCR/HEP Exascale Requirements Review Report

This draft report summarizes and details the findings, results, and recommendations derived from the ASCR/HEP Exascale Requirements Review meeting held in June, 2015. The main conclusions are as follows. 1) Larger, more capable computing and data facilities are needed to support HEP science goals in all three frontiers: Energy, Intensity, and Cosmic. The expected scale of the demand at the 2025 timescale is at least two orders of magnitude -- and in some cases greater -- than that available currently. 2) The growth rate of data produced by simulations is overwhelming the current ability, of both facilities and researchers, to store and analyze it. Additional resources and new techniques for data analysis are urgently needed. 3) Data rates and volumes from HEP experimental facilities are also straining the ability to store and analyze large and complex data volumes. Appropriately configured leadership-class facilities can play a transformational role in enabling scientific discovery from these datasets. 4) A close integration of HPC simulation and data analysis will aid greatly in interpreting results from HEP experiments. Such an integration will minimize data movement and facilitate interdependent workflows. 5) Long-range planning between HEP and ASCR will be required to meet HEP's research needs. To best use ASCR HPC resources the experimental HEP program needs a) an established long-term plan for access to ASCR computational and data resources, b) an ability to map workflows onto HPC resources, c) the ability for ASCR facilities to accommodate workflows run by collaborations that can have thousands of individual members, d) to transition codes to the next-generation HPC platforms that will be available at ASCR facilities, e) to build up and train a workforce capable of developing and using simulations and analysis to support HEP scientific research on next-generation systems.Comment: 77 pages, 13 Figures; draft report, subject to further revisio

arXiv.org e-Print Archive

eScholarship - University of California

The OpenMC Monte Carlo particle transport code

Author: Forget Benoit Robert Yves
Romano Paul Kollath
Publication venue: 'Elsevier BV'
Publication date: 01/06/2012
Field of study

A new Monte Carlo code called OpenMC is currently under development at the Massachusetts Institute of Technology as a tool for simulation on high-performance computing platforms. Given that many legacy codes do not scale well on existing and future parallel computer architectures, OpenMC has been developed from scratch with a focus on high performance scalable algorithms as well as modern software design practices. The present work describes the methods used in the OpenMC code and demonstrates the performance and accuracy of the code on a variety of problems.United States. Department of Energy (DE-AC05-00OR22725

DSpace@MIT

Scalability in the Presence of Variability

Author: Kocoloski Brian
Publication venue
Publication date: 31/01/2018
Field of study

Supercomputers are used to solve some of the world’s most computationally demanding problems. Exascale systems, to be comprised of over one million cores and capable of 10^18 floating point operations per second, will probably exist by the early 2020s, and will provide unprecedented computational power for parallel computing workloads. Unfortunately, while these machines hold tremendous promise and opportunity for applications in High Performance Computing (HPC), graph processing, and machine learning, it will be a major challenge to fully realize their potential, because to do so requires balanced execution across the entire system and its millions of processing elements. When different processors take different amounts of time to perform the same amount of work, performance imbalance arises, large portions of the system sit idle, and time and energy are wasted. Larger systems incorporate more processors and thus greater opportunity for imbalance to arise, as well as larger performance/energy penalties when it does. This phenomenon is referred to as performance variability and is the focus of this dissertation. In this dissertation, we explain how to design system software to mitigate variability on large scale parallel machines. Our approaches span (1) the design, implementation, and evaluation of a new high performance operating system to reduce some classes of performance variability, (2) a new performance evaluation framework to holistically characterize key features of variability on new and emerging architectures, and (3) a distributed modeling framework that derives predictions of how and where imbalance is manifesting in order to drive reactive operations such as load balancing and speed scaling. Collectively, these efforts provide a holistic set of tools to promote scalability through the mitigation of variability

D-Scholarship@Pitt

Exploring the capabilities of support vector machines in detecting silent data corruptions

Author: Balaprakash Prasanna
Bautista-Gomez Leonardo
Cappello Franck
Cristal Kestelman Adrián
Di Sheng
Krishnamoorthy Sriram
Labarta Mancho Jesús José
Subasi Omer
Unsal Osman Sabri
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

As the exascale era approaches, the increasing capacity of high-performance computing (HPC) systems with targeted power and energy budget goals introduces significant challenges in reliability. Silent data corruptions (SDCs), or silent errors, are one of the major sources that corrupt the execution results of HPC applications without being detected. In this work, we explore a set of novel SDC detectors – by leveraging epsilon-insensitive support vector machine regression – to detect SDCs that occur in HPC applications. The key contributions are threefold. (1) Our exploration takes temporal, spatial, and spatiotemporal features into account and analyzes different detectors based on different features. (2) We provide an in-depth study on the detection ability and performance with different parameters, and we optimize the detection range carefully. (3) Experiments with eight real-world HPC applications show that support-vector-machine-based detectors can achieve detection sensitivity (i.e., recall) up to 99% yet suffer a less than 1% false positive rate for most cases. Our detectors incur low performance overhead, 5% on average, for all benchmarks studied in this work.This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research under Award Number 66905, program manager Lucy Nowell. Pacific Northwest National Laboratory is operated by Battelle for DOE under Contract DE-AC05-76RL01830. In addition, this material is based upon work supported by the National Science Foundation under Grant No. 1619253, and also by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, program manager Lucy Nowell, under contract number DE-AC02-06CH11357 (DOE Catalog project) and in part by the European Union FEDER funds under contract TIN2015-65316-P.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Digital.CSIC

Master of Science

Author: Haran Arvind
Publication venue: University of Utah
Publication date: 01/01/2016
Field of study

thesisTo minimize resource consumption and maximize performance, computer architecture research has been investigating approaches that may compute inaccurate solutions. Such hardware inaccuracies may induce a wide variety of program behaviors which are not obs

The University of Utah: J. Willard Marriott Digital Library

Runtime-aware architectures

Author: Ayguadé Parra Eduard
Casas Guix Marc
Castillo Villar Emilio
Chasapis Dimitrios
Cristal Kestelman Adrián
Hayes Timothy
Jaulmes Luc
Labarta Mancho Jesús José
Moreto Planas Miquel
Palomar Pérez Óscar
Unsal Osman Sabri
Valero Cortés Mateo
Álvarez Martí Lluc
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

In the last few years, the traditional ways to keep the increase of hardware performance to the rate predicted by the Moore’s Law have vanished. When uni-cores were the norm, hardware design was decoupled from the software stack thanks to a well defined Instruction Set Architecture (ISA). This simple interface allowed developing applications without worrying too much about the underlying hardware, while hardware designers were able to aggressively exploit instruction-level parallelism (ILP) in superscalar processors. Current multi-cores are designed as simple symmetric multiprocessors (SMP) on a chip. However, we believe that this is not enough to overcome all the problems that multi-cores face. The runtime system of the parallel programming model has to drive the design of future multi-cores to overcome the restrictions in terms of power, memory, programmability and resilience that multi-cores have. In the paper, we introduce an approach towards a Runtime-Aware Architecture (RAA), a massively parallel architecture designed from the runtime’s perspective.This work has been partially supported by the European Research Council under the European Union’s 7th FP, ERC Grant Agreement number 321253, by the Spanish Ministry of Science and Innovation under grant TIN2012-34557 and by the HiPEAC Network of Excellence. M. Moreto has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship number JCI- 2012-15047, and M. Casas is supported by the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Co-fund programme of the Marie Curie Actions of the 7th R&D Framework Programme of the European Union (Contract 2013 BP B 00243).Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

New Design Techniques for Dynamic Reconfigurable Architectures

Author: BOZZOLI LUDOVICA
Publication venue: country:Italy
Publication date: 28/09/2021
Field of study

L'abstract è presente nell'allegato / the abstract is in the attachmen

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)