Search CORE

17 research outputs found

Mixed-mode multicore reliability

Author: Gurindar S. Sohi
Koushik Chakraborty
Philip M. Wells
Publication venue: IEEE Computer Society
Publication date: 01/01/2009
Field of study

Future processors are expected to observe increasing rates of hardware faults. Using Dual-Modular Redundancy (DMR), two cores of a multicore can be loosely coupled to redundantly execute a single software thread, providing very high coverage from many difference sources of faults. This reliability, however, comes at a high price in terms of per-thread IPC and overall system throughput. We make the observation that a user may want to run both applications requiring high reliability, such as financial software, and more fault tolerant applications requiring high performance, such as media or web software, on the same machine at the same time. Yet a traditional DMR system must fully operate in redundant mode whenever any application requires high reliability. This paper proposes a Mixed-Mode Multicore (MMM), which enables most applications, including the system software, to run with high reliability in DMR mode, while applications that need high performance can avoid the penalty of DMR. Though conceptually simple, two key challenges arise: 1) care must be taken to protect reliable applications from any faults occurring to applications running in high performance mode, and 2) the desire to execute additional independent software threads for a performance application complicates the scheduling of computation to cores. After solving these issues, an MMM is shown to improve overall system performance, compared to a traditional DMR system, by approximately 2X when one reliable and one performance application are concurrently executing

CiteSeerX

Crossref

Secure Virtualization and Multicore Platforms State-of-the-Art report

Author: Douglas Heradon
Gehrmann Christian
Publication venue: Swedish Institute of Computer Science
Publication date: 01/01/2009
Field of study

SVaM

CiteSeerX

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

SMCV: a Methodology for Detecting Transient Faults in Multicore Clusters

Author: De Giusti Armando Eduardo
Frati Fernando Emmanuel
Luquet Emilio
Montezanti Diego Miguel
Naiouf Marcelo
Rexachs del Rosario Dolores
Publication venue
Publication date: 22/05/2020
Field of study

The challenge of improving the performance of current processors is achieved by increasing the integration scale. This carries a growing vulnerability to transient faults, which increase their impact on multicore clusters running large scientific parallel applications. The requirement for enhancing the reliability of these systems, coupled with the high cost of rerunning the application from the beginning, create the motivation for having specific software strategies for the target systems. This paper introduces SMCV, which is a fully distributed technique that provides fault detection for message-passing parallel applications, by validating the contents of the messages to be sent, preventing the transmission of errors to other processes and leveraging the intrinsic hardware redundancy of the multicore. SMCV achieves a wide robustness against transient faults with a reduced overhead, and accomplishes a trade-off between moderate detection latency and low additional workload.Instituto de Investigación en Informátic

SMCV: a Methodology for Detecting Transient Faults in Multicore Clusters

Author: De Giusti Armando Eduardo
Frati Fernando Emmanuel
Luquet Emilio
Montezanti Diego Miguel
Naiouf Marcelo
Rexachs del Rosario Dolores
Publication venue
Publication date: 22/05/2020
Field of study

Servicio de Difusión de la Creación Intelectual

Characterizing the Effects of Intermittent Faults on a Processor for Dependability Enhancement Strategy

Author: Chao(Saul) Wang
Dong-Sheng Wang
Hong-Song Chen
Zhong-Chuan Fu
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

As semiconductor technology scales into the nanometer regime, intermittent faults have become an increasing threat. This paper focuses on the effects of intermittent faults on NET versus REG on one hand and the implications for dependability strategy on the other. First, the vulnerability characteristics of representative units in OpenSPARC T2 are revealed, and in particular, the highly sensitive modules are identified. Second, an arch-level dependability enhancement strategy is proposed, showing that events such as core/strand running status and core-memory interface events can be candidates of detectable symptoms. A simple watchdog can be deployed to detect application running status (IEXE event). Then SDC (silent data corruption) rate is evaluated demonstrating its potential. Third and last, the effects of traditional protection schemes in the target CMT to intermittent faults are quantitatively studied on behalf of the contribution of each trap type, demonstrating the necessity of taking this factor into account for the strategy

Crossref

Directory of Open Access Journals

PubMed Central

Propuesta de tesis: tratamiento de fallos transitorios en entornos de cluster de multicores

Author: Montezanti Diego Miguel
Publication venue
Publication date: 08/08/2012
Field of study

El objetivo de mejorar el rendimiento en las computadoras actuales ha producido el reto de utilizar mayor cantidad de transistores (mayor densidad) y aumentar la frecuencia de operación, además de una disminución en la tensión de alimentación. Todo esto se traduce en un aumento en la temperatura y una mayor cantidad de interferencias, provenientes del entorno, que afectan a los procesadores. Además, con el advenimiento de los multicores y los manycores, se han integrado varios núcleos de procesamiento en el mismo chip. La combinación de todos estos factores tiene como consecuencia que las computadoras sean cada vez menos robustas frente a la ocurrencia de fallos transitorios. El presente trabajo de Tesis se enfoca en el tratamiento de fallos transitorios que ocurren en los registros internos de los cores que conforman un procesador actual, en el contexto de un cluster de multicores en el que se está ejecutando una aplicación científica, de cómputo intensivo. Estos fallos pueden afectar tanto a datos como a instrucciones o direcciones. El centro de atención está puesto en los fallos silenciosos, aquellos que producen corrupciones de datos que alteran la ejecución del programa, pero sin provocar violaciones detectables a nivel del sistema operativo. La ocurrencia de estos fallos se traduce en la ejecución del programa con parámetros erróneos, de modo que proporciona resultados incorrectos. En este contexto, el objetivo del trabajo de Tesis es el diseño y desarrollo de un sistema de middleware que detecte y tolere los fallos transitorios en un entorno de cluster de multicores, de manera transparente al usuario, manteniendo un nivel de robustez especificado y optimizando la utilización de recursos en los multicores para minimizar la ineficiencia que implica replicar y comparar toda la ejecución.Presentado en el Encuentro de Tesistas de PostgradoRed de Universidades con Carreras en Informática (RedUNCI

Servicio de Difusión de la Creación Intelectual

Thin Hypervisor-Based Security Architectures for Embedded Platforms

Author: Douglas Heradon
Publication venue
Publication date: 01/01/2010
Field of study

Virtualization has grown increasingly popular, thanks to its benefits of isolation, management, and utilization, supported by hardware advances. It is also receiving attention for its potential to support security, through hypervisor-based services and advanced protections supplied to guests. Today, virtualization is even making inroads in the embedded space, and embedded systems, with their security needs, have already started to benefit from virtualization’s security potential. In this thesis, we investigate the possibilities for thin hypervisor-based security on embedded platforms. In addition to significant background study, we present implementation of a low-footprint, thin hypervisor capable of providing security protections to a single FreeRTOS guest kernel on ARM. Backed by performance test results, our hypervisor provides security to a formerly unsecured kernel with minimal performance overhead, and represents a first step in a greater research effort into the security advantages and possibilities of embedded thin hypervisors. Our results show that thin hypervisors are both possible and beneficial even on limited embedded systems, and sets the stage for more advanced investigations, implementations, and security applications in the future

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Most Progress Made Algorithm: Combating Synchronization Induced Performance Loss on Salvaged Chip Multi-Processors

Author: Dutson Jacob J.
Publication venue: DigitalCommons@USU
Publication date: 01/05/2013
Field of study

Recent increases in hard fault rates in modern chip multi-processors have led to a variety of approaches to try and save manufacturing yield. Among these are: fine-grain fault tolerance (such as error correction coding, redundant cache lines, and redundant functional units), and large-grain fault tolerance (such as disabling of faulty cores, adding extra cores, and core salvaging techniques). This paper considers the case of core salvaging techniques and the heterogeneous performance introduced when these techniques have some salvaged and some non-faulty cores. It proposes a hypervisor-based hardware thread scheduler, triggered by detection of spin locks and thread imbalance, that mitigates the loss of throughput resulting from this het- erogeneity. Specifically, a new algorithm, called Most ProgressMade algorithm, reduces the number of synchronization locks held on a salvaged core and balances the time each thread in an application spends running on that core. For some benchmarks, the results show as much as a 2.68x increase in performance over a salvaged chip multi-processor without this technique

DigitalCommons@USU

SIMD-Swift: Improving Performance of Swift Fault Detection

Author: Oleksenko Oleksii
Publication venue
Publication date: 02/12/2015
Field of study

The general tendency in modern hardware is an increase in fault rates, which is caused by the decreased operation voltages and feature sizes. Previously, the issue of hardware faults was mainly approached only in high-availability enterprise servers and in safety-critical applications, such as transport or aerospace domains. These fields generally have very tight requirements, but also higher budgets. However, as fault rates are increasing, fault tolerance solutions are starting to be also required in applications that have much smaller profit margins. This brings to the front the idea of software-implemented hardware fault tolerance, that is, the ability to detect and tolerate hardware faults using software-based techniques in commodity CPUs, which allows to get resilience almost for free. Current solutions, however, are lacking in performance, even though they show quite good fault tolerance results. This thesis explores the idea of using the Single Instruction Multiple Data (SIMD) technology for executing all program\'s operations on two copies of the same data. This idea is based on the observation that SIMD is ubiquitous in modern CPUs and is usually an underutilized resource. It allows us to detect bit-flips in hardware by a simple comparison of two copies under the assumption that only one copy is affected by a fault. We implemented this idea as a source-to-source compiler which performs hardening of a program on the source code level. The evaluation of our several implementations shows that it is beneficial to use it for applications that are dominated by arithmetic or logical operations, but those that have more control-flow or memory operations are actually performing better with the regular instruction replication. For example, we managed to get only 15% performance overhead on Fast Fourier Transformation benchmark, which is dominated by arithmetic instructions, but memory-access-dominated Dijkstra algorithm has shown a high overhead of 200%

Technische Universität Dresden: Qucosa

PARALLEL EXECUTION TRACING: AN ALTERNATIVE SOLUTION TO EXPLOIT UNDER-UTILIZED RESOURCES IN MULTI-CORE ARCHITECTURES FOR CONTROL-FLOW CHECKING

Author: Maghsoudloo Mohammad
Zarandi Hamid Reza
Publication venue: Published by the University of Niš, Serbia
Publication date: 11/01/2016
Field of study

In this paper, a software behavior-based technique is presented to detect control-flow errors in multi-core architectures. The analysis of a key point leads to introduce the proposed technique: employing under-utilized CPU resources in multi-core processors to check the execution flow of the programs concurrently and in parallel with the main executions. To evaluate the proposed technique, a quad-core processor system was used as the simulation environment, and the behavior of SPEC CPU2006 benchmarks were studied as the target to compare with conventional techniques. The experimental results, with regard to both detection coverage and performance overhead, demonstrate that on average about 94% of the control-flow errors can be detected by the proposed technique, more efficiently. This article has been retracted. Link to the retraction: http://casopisi.junis.ni.ac.rs/index.php/FUElectEnerg/article/view/337

University of Niš: Facta Universitatis (E-Journals) / Универзитет у Нишу