Search CORE

543 research outputs found

The roll back chip: hardware support for distributed simulation using time warp

Author: Fujimoto Richard M.
Tsai Jya-Jang
Publication venue: University of Utah
Publication date: 01/01/1987
Field of study

Journal ArticleDistributed simulation offers an attractive means of meeting the high computational demands of discrete event simulation programs. The Time Warp mechanism has been proposed to ensure correct sequencing of events in distributed simulation programs without blocking processes unnecessarily. However, the overhead of state saving and rollback in Time Warp is one obstacle that may severely degrade performance. A special purpose hardware component, the rollback chip (RBC), is proposed to manage the state of a processor and provide an efficient rollback mechanism within a node of a parallel computer. The chip may be viewed as a special purpose memory management unit that lies on the data path between processor and memory. The algorithm implemented by the rollback chip is described, as well as extensions to the basic design. Implementation of the chip is briefly discussed. In addition to distributed simulation, the rollback chip may be used in other applications using the Time Warp mechanism, notably distributed database concurrency control

The University of Utah: J. Willard Marriott Digital Library

Submicron Systems Architecture Project: Semiannual Technical Report

Author: Seitz Charles L.
Publication venue: 'California Institute of Technology Library'
Publication date: 01/01/1987
Field of study

No abstract available

Caltech Authors

High efficiency, character-oriented, local area networks

Author: Cripps Martin David
Cripps Martin David
Publication venue: Department of Computing, Imperial College London
Publication date: 01/01/1990
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository

Designs for increasing reliability while reducing energy and increasing lifetime

Author: Yalcin Gulay
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2014
Field of study

In the last decades, the computing technology experienced tremendous developments. For instance, transistors' feature size shrank to half at every two years as consistently from the first time Moore stated his law. Consequently, number of transistors and core count per chip doubles at each generation. Similarly, petascale systems that have the capability of processing more than one billion calculation per second have been developed. As a matter of fact, exascale systems are predicted to be available at year 2020. However, these developments in computer systems face a reliability wall. For instance, transistor feature sizes are getting so small that it becomes easier for high-energy particles to temporarily flip the state of a memory cell from 1-to-0 or 0-to-1. Also, even if we assume that fault-rate per transistor stays constant with scaling, the increase in total transistor and core count per chip will significantly increase the number of faults for future desktop and exascale systems. Moreover, circuit ageing is exacerbated due to increased manufacturing variability and thermal stresses, therefore, lifetime of processor structures are becoming shorter. On the other side, due to the limited power budget of the computer systems such that mobile devices, it is attractive to scale down the voltage. However, when the voltage level scales to beyond the safe margin especially to the ultra-low level, the error rate increases drastically. Nevertheless, new memory technologies such as NAND flashes present only limited amount of nominal lifetime, and when they exceed this lifetime, they can not guarantee storing of the data correctly leading to data retention problems. Due to these issues, reliability became a first-class design constraint for contemporary computing in addition to power and performance. Moreover, reliability even plays increasingly important role when computer systems process sensitive and life-critical information such as health records, financial information, power regulation, transportation, etc. In this thesis, we present several different reliability designs for detecting and correcting errors occurring in processor pipelines, L1 caches and non-volatile NAND flash memories due to various reasons. We design reliability solutions in order to serve three main purposes. Our first goal is to improve the reliability of computer systems by detecting and correcting random and non-predictable errors such as bit flips or ageing errors. Second, we aim to reduce the energy consumption of the computer systems by allowing them to operate reliably at ultra-low voltage level. Third, we target to increase the lifetime of new memory technologies by implementing efficient and low-cost reliability schemes

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Sampled simulation of task-based programs

Author: Ayguadé Parra Eduard
Carlson Trevor E.
Casas Guix Marc
Ceballos Germán
Grass Thomas
Moreto Planas Miquel
Rico Carro Alejandro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksSampled simulation is a mature technique for reducing simulation time of single-threaded programs. Nevertheless, current sampling techniques do not take advantage of other execution models, like task-based execution, to provide both more accurate and faster simulation. Recent multi-threaded sampling techniques assume that the workload assigned to each thread does not change across multiple executions of a program. This assumption does not hold for dynamically scheduled task-based programming models. Task-based programming models allow the programmer to specify program segments as tasks which are instantiated many times and scheduled dynamically to available threads. Due to variation in scheduling decisions, two consecutive executions on the same machine typically result in different instruction streams processed by each thread. In this paper, we propose TaskPoint, a sampled simulation technique for dynamically scheduled task-based programs. We leverage task instances as sampling units and simulate only a fraction of all task instances in detail. Between detailed simulation intervals, we employ a novel fast-forwarding mechanism for dynamically scheduled programs. We evaluate different automatic techniques for clustering task instances and show that DBSCAN clustering combined with analytical performance modeling provides the best trade-off of simulation speed and accuracy. TaskPoint is the first technique combining sampled simulation and analytical modeling and provides a new way to trade off simulation speed and accuracy. Compared to detailed simulation, TaskPoint accelerates architectural simulation with 8 simulated threads by an average factor of 220x at an average error of 0.5 percent and a maximum error of 7.9 percent.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Publikationsserver der RWTH Aachen University

ParaMedic: Heterogeneous Parallel Error Correction

Author: Ainsworth S
Jones TM
Publication venue: Proceedings - 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2019
Publication date: 01/01/2019
Field of study

Processor error detection can be reduced in cost significantly by exploiting the parallelism that exists in a repeated copy of an execution, which may not exist in the original code, to split up the redundant work on a large number of small, highly efficient cores. However, such schemes don't provide a method for automatic error recovery. We develop ParaMedic, an architecture to allow efficient automatic correction of errors detected in a system by using parallel heterogeneous cores, to provide a full fail-safe system that does not propagate errors to other systems, and can recover without manual intervention. This uses logging to roll back any computation that occurred after a detected error, along with a set of techniques to provide error-checking parallelism while still preventing the escape of incorrect processor values in multicore environments, where ordering of individual processors' logs is not enough to be able to roll back execution. Across a set of single and multi-threaded benchmarks, we achieve 3.1\% and 1.5\% overhead respectively, compared with 1.9\% and 1\% for error detection alone.Arm Lt

Crossref

Apollo (Cambridge)

E-QED: Electrical Bug Localization During Post-Silicon Validation Enabled by Quick Error Detection and Formal Methods

Author: B Vermeulen
D Lin
E Clarke
FM Paula De
HF Ko
M Abramovici
M Dusanapudi
NR Saxena
PH Bardell
PN Sanda
RB Jones
S-B Park
T Larrabee
Publication venue
Publication date: 23/07/2017
Field of study

During post-silicon validation, manufactured integrated circuits are extensively tested in actual system environments to detect design bugs. Bug localization involves identification of a bug trace (a sequence of inputs that activates and detects the bug) and a hardware design block where the bug is located. Existing bug localization practices during post-silicon validation are mostly manual and ad hoc, and, hence, extremely expensive and time consuming. This is particularly true for subtle electrical bugs caused by unexpected interactions between a design and its electrical state. We present E-QED, a new approach that automatically localizes electrical bugs during post-silicon validation. Our results on the OpenSPARC T2, an open-source 500-million-transistor multicore chip design, demonstrate the effectiveness and practicality of E-QED: starting with a failed post-silicon test, in a few hours (9 hours on average) we can automatically narrow the location of the bug to (the fan-in logic cone of) a handful of candidate flip-flops (18 flip-flops on average for a design with ~ 1 Million flip-flops) and also obtain the corresponding bug trace. The area impact of E-QED is ~2.5%. In contrast, deter-mining this same information might take weeks (or even months) of mostly manual work using traditional approaches

arXiv.org e-Print Archive

Crossref

Smart Bike Aftermarket System (SBAMS)

Author: Albrecht John
Beachy Dawson
Bullock Jack
Lamadanie Noah
Publication venue: IdeaExchange@UAkron
Publication date: 01/01/2022
Field of study

The Smart Bike Aftermarket System will be a set of connected modules, which, when installed on a standard bicycle, will allow it to mimic some of the safety and quality-of-life functionalities of an E-bike. The most notable of these safety features is an ability to detect vehicles approaching from behind and alert the user of potential collisions. The system will also implement lighting (Headlights, taillights, and turn signals), and a coupled application so a user can view more advanced information about their cycling. The primary advantage of this system over a standard E-bike is that it will have a lower bar to entry, as it will not require the purchase of a new bicycle. This will improve the ability of users with limited resources to keep themselves safe while cycling, saving lives and avoiding injuries

The University of Akron

Voting margin: A scheme for error-tolerant k nearest neighbors classifiers for machine learning

Author: Hernández Gutiérrez José Alberto
Liu Shanshan
Lombardi Fabrizio
Reviriego Vasallo Pedro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2021
Field of study

Machine learning (ML) techniques such as classifiers are used in many applications, some of which are related to safety or critical systems. In this case, correct processing is a strict requirement and thus ML algorithms (such as for classification) must be error tolerant. A naive approach to implement error tolerant classifiers is to resort to general protection techniques such as modular redundancy. However, modular redundancy incurs in large overheads in many metrics such as hardware utilization and power consumption that may not be acceptable in applications that run on embedded or battery powered systems. Another option is to exploit the algorithmic properties of the classifier to provide protection and error tolerance at a lower cost. This paper explores this approach for a widely used classifier, the k Nearest Neighbors ( k NNs), and proposes an efficient scheme to protect it against errors. The proposed technique is based on a time-based modular redundancy (TBMR) scheme. The proposed scheme exploits the intrinsic redundancy of k NNs to drastically reduce the number of re-computations needed to detect errors. This is achieved by noting that when voting among the k nearest neighbors has a large majority, an error in one of the voters cannot change the result, hence voting margin (VM). This observation has been refined and extended in the proposed VM scheme to also avoid re-computations in some cases in which the majority vote is tight. The VM scheme has been implemented and evaluated with publicly available data sets that cover a wide range of applications and settings. The results show that by exploiting the intrinsic redundancy of the classifier, the proposed scheme is able to reduce the cost compared to modular redundancy by more than 60 percent in all configurations evaluated.Pedro Reviriego and Josée Alberto Hernández would like to acknowledge the support of the TEXEO project TEC2016-80339-R funded by the Spanish Ministry of Economy and Competitivity and of the Madrid Community research project TAPIR-CM Grant no. P2018/TCS-4496

Universidad Carlos III de Madrid e-Archivo

Reconfigurable RRAM-based computing: A Case study for reliability enhancement

Author: Catanzaro Matthew
Publication venue: RIT Scholar Works
Publication date: 01/08/2012
Field of study

Emerging hybrid-CMOS nanoscale devices and architectures offer greater degree of integration and performance capabilities. However, the high power densities, hard error frequency, process variations, and device wearout affect the overall system reliability. Reactive design techniques, such as redundancy, account for component failures by mitigating them to prevent system failures. These techniques incur high area and power overhead. This research focuses on exploring hybrid CMOS/Resistive RAM (RRAM) architectures that enhance the system reliability by performing computation in RRAM cache whenever CMOS logic units fail, essentially masking the area overhead of redundant logic when not in use. The proposed designs are validated using the Gem5 performance simulator and McPAT power simulator running single-core SPEC2006 benchmarks and multi-core PARSEC benchmarks. The simulation results are used to evaluate the efficacy of reliability enhancement techniques using RRAM. The average runtime when using RRAM for functional unit replacement was between ~1.5 and ~2.5 times longer than the baseline for a single-core architecture, ~1.25 and ~2 times longer for an 8-core architecture, and ~1.2 and ~1.5 times longer for a 16-core architecture. Average energy consumption when using RRAM for functional unit replacement was between ~2 and ~5 times more than the baseline for a single-core architecture, and ~1.25 and ~2.75 times more for multi-core architectures. The performance degradation and energy consumption increase is justified by the prevention of system failure and enhanced reliability. Overall, the proposed architecture shows promise for use in multi-core systems. Average performance degradation decreases as more cores are used due to more total functional units being available, preventing a slow RRAM functional unit from becoming a bottleneck

RIT Scholar Works