Search CORE

803 research outputs found

Fault-free performance validation of fault-tolerant multiprocessors

Author: Czeck Edward W.
Feather Frank E.
Grizzaffi Ann Marie
Segall Zary Z.
Siewiorek Daniel P.
Publication venue
Publication date
Field of study

A validation methodology for testing the performance of fault-tolerant computer systems was developed and applied to the Fault-Tolerant Multiprocessor (FTMP) at NASA-Langley's AIRLAB facility. This methodology was claimed to be general enough to apply to any ultrareliable computer system. The goal of this research was to extend the validation methodology and to demonstrate the robustness of the validation methodology by its more extensive application to NASA's Fault-Tolerant Multiprocessor System (FTMP) and to the Software Implemented Fault-Tolerance (SIFT) Computer System. Furthermore, the performance of these two multiprocessors was compared by conducting similar experiments. An analysis of the results shows high level language instruction execution times for both SIFT and FTMP were consistent and predictable, with SIFT having greater throughput. At the operating system level, FTMP consumes 60% of the throughput for its real-time dispatcher and 5% on fault-handling tasks. In contrast, SIFT consumes 16% of its throughput for the dispatcher, but consumes 66% in fault-handling software overhead

NASA Technical Reports Server

Optimal pre-scheduling of problem remappings

Author: Nicol David M.
Saltz Joel H.
Publication venue
Publication date
Field of study

A large class of scientific computational problems can be characterized as a sequence of steps where a significant amount of computation occurs each step, but the work performed at each step is not necessarily identical. Two good examples of this type of computation are: (1) regridding methods which change the problem discretization during the course of the computation, and (2) methods for solving sparse triangular systems of linear equations. Recent work has investigated a means of mapping such computations onto parallel processors; the method defines a family of static mappings with differing degrees of importance placed on the conflicting goals of good load balance and low communication/synchronization overhead. The performance tradeoffs are controllable by adjusting the parameters of the mapping method. To achieve good performance it may be necessary to dynamically change these parameters at run-time, but such changes can impose additional costs. If the computation's behavior can be determined prior to its execution, it can be possible to construct an optimal parameter schedule using a low-order-polynomial-time dynamic programming algorithm. Since the latter can be expensive, the performance is studied of the effect of a linear-time scheduling heuristic on one of the model problems, and it is shown to be effective and nearly optimal

NASA Technical Reports Server

NSL-BLRL: Efficient Cache Warmup for Sampled Processor Simulation

Author: De Bosschere Koen
Eeckhout Lieven
Hellebaut Filip
Van Ertvelde Luk
Publication venue: IEEE Computer Society
Publication date: 01/01/2006
Field of study

Ghent University Academic Bibliography

Profiling I/O interrupts in modern architectures

Author: Davis Al
Schaelicke Lambert
Publication venue: University of Utah
Publication date: 01/01/1999
Field of study

Journal ArticleAs applications grow increasingly communication-oriented, interrupt performance quickly becomes a crucial component of high performance I/O system design. At the same time, accurately measuring interrupt handler performance is difficult with the traditional simulation, instrumentation, or statistical sampling approaches. One o f the most important components o f interrupt performance is cache behavior. This paper presents a portable method for measuring the cache effects o f I/O interrupt handling using native hardware performance counters. To provide a portability stress test, the method is demonstrated on two commercial platforms with different architectures, the SGI Origin 200 and the Sun LJltra-1. This case study uses the methodology to measure the overhead of the two most common forms o f interrupt traffic: disk and network interrupts. The study demonstrates that the method works well and is reasonably robust. In addition, the results show that disk interrupts behave similar on both platforms, while differences in OS organization cause network interrupts to behave very differently. Furthermore, network interrupts exhibit significantly larger cache footprints.

The University of Utah: J. Willard Marriott Digital Library

Exploiting Data Representation for Fault Tolerance

Author: Elliott James
Hoemmen Mark
Mueller Frank
Publication venue: 'Elsevier BV'
Publication date: 09/12/2013
Field of study

We explore the link between data representation and soft errors in dot products. We present an analytic model for the absolute error introduced should a soft error corrupt a bit in an IEEE-754 floating-point number. We show how this finding relates to the fundamental linear algebra concepts of normalization and matrix equilibration. We present a case study illustrating that the probability of experiencing a large error in a dot product is minimized when both vectors are normalized. Furthermore, when data is normalized we show that the absolute error is less than one or very large, which allows us to detect large errors. We demonstrate how this finding can be used by instrumenting the GMRES iterative solver. We count all possible errors that can be introduced through faults in arithmetic in the computationally intensive orthogonalization phase, and show that when scaling is used the absolute error can be bounded above by one

arXiv.org e-Print Archive

CiteSeerX

Crossref

Tree-Searching Algorithms on Parallel Architectures

Author: Mebrotra Mala
Publication venue: W&M ScholarWorks
Publication date: 01/01/1985
Field of study

College of William & Mary: W&M Publish

04231 Abstracts Collection -- Scheduling in Computer and Manufacturing Systems

Author: Blazewicz Jacek
Ecker Klaus
Pesch Erwin
Trystram Denis
Publication venue: Dagstuhl Seminar Proceedings. 04231 - Scheduling in Computer and Manufacturing Systems
Publication date: 01/01/2004
Field of study

During 31.05.-04.06.04, the Dagstuhl Seminar 04231 "Scheduling in Computer and Manufacturing Systems" was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

Dagstuhl Research Online Publication Server