Search CORE

9 research outputs found

Quasi-) thread-safe PVM and (quasi-) thread-safe MPI without active polling

Author: A. Ferrari
H. Zhou
K.R. Subramaniam
Publication venue: Springer
Publication date: 01/01/2002
Field of study

Abstract. PVM (the current version 3.4) as well as many current MPI implementations force application programmers to use active polling (also known as busy waiting) in larger parallel programs. This serious problem is related to thread-unsafety of these communication libraries. While the MPI specification is very careful in this respect, the implementations are not. We present a new mechanism of interruptable blocking receive which makes PVM and MPI quasi-thread-safe. This mechanism does not require any modifications to the existing semantics of PVM or MPI (we only extend the interfaces with two new functions) and allows writing multi-threaded programs without active polling. Then we sketch how the interrupt mechanism can be hidden in the implementations of PVM and MPI, making both PVM and MPI completely threadsafe without active polling. Results of our experiments promise a significant speedup for all larger communication-intensive parallel applications

CiteSeerX

Crossref

HPCCP/CAS Workshop Proceedings 1998

Author: Mata Ellen
Schulbach Catherine
Schulbach Catherine
Publication venue
Publication date
Field of study

This publication is a collection of extended abstracts of presentations given at the HPCCP/CAS (High Performance Computing and Communications Program/Computational Aerosciences Project) Workshop held on August 24-26, 1998, at NASA Ames Research Center, Moffett Field, California. The objective of the Workshop was to bring together the aerospace high performance computing community, consisting of airframe and propulsion companies, independent software vendors, university researchers, and government scientists and engineers. The Workshop was sponsored by the HPCCP Office at NASA Ames Research Center. The Workshop consisted of over 40 presentations, including an overview of NASA's High Performance Computing and Communications Program and the Computational Aerosciences Project; ten sessions of papers representative of the high performance computing research conducted within the Program by the aerospace industry, academia, NASA, and other government laboratories; two panel sessions; and a special presentation by Mr. James Bailey

NASA Technical Reports Server

Advanced neural networks : finance, forecast, and other applications

Author: Mettenheim Hans-Jörg Henri von
Publication venue: Hannover : Gottfried Wilhelm Leibniz Universität Hannover
Publication date: 01/01/2010
Field of study

[no abstract

Institutionelles Repositorium der Leibniz Universität Hannover

Evaluating the performance of distributed agreement algorithms:tools, methodology and case studies

Author: Urbán Péter
Publication venue: Lausanne, EPFL
Publication date: 16/03/2005
Field of study

Nowadays, networked computers are present in most aspects of everyday life. Moreover, essential parts of society come to depend on distributed systems formed of networked computers, thus making such systems secure and fault tolerant is a top priority. If the particular fault tolerance requirement is high availability, replication of components is a natural choice. Replication is a difficult problem as the state of the replicas must be kept consistent even if some replicas fail, and because in distributed systems, relying on centralized control or a certain timing behavior is often not feasible. Replication in distributed systems is often implemented using group communication. Group communication is concerned with providing high-level multipoint communication primitives and the associated tools. Most often, an emphasis is put on tolerating crash failures of processes. At the heart of most communication primitives lies an agreement problem: the members of a group must agree on things like the set of messages to be delivered to the application, the delivery order of messages, or the set of processes that crashed. A lot of algorithms to solve agreement problems have been proposed and their correctness proven. However, performance aspects of agreement algorithms have been somewhat neglected, for a variety of reasons: the lack of theoretical and practical tools to help performance evaluation, and the lack of well-defined benchmarks for agreement algorithms. Also, most performance studies focus on analyzing failure free runs only. In our view, the limited understanding of performance aspects, in both failure free scenarios and scenarios with failure handling, is an obstacle for adopting agreement protocols in practice, and is part of the explanation why such protocols are not in widespread use in the industry today. The main goal of this thesis is to advance the state of the art in this field. The thesis has major contributions in three domains: new tools, methodology and performance studies. As for new tools, a simulation and prototyping framework offers a practical tool, and some new complexity metrics a theoretical tool for the performance evaluation of agreement algorithms. As for methodology, the thesis proposes a set of well-defined benchmarks for atomic broadcast algorithms (such algorithms are important as they provide the basis for a number of replication techniques). Finally, three studies are presented that investigate important performance issues with agreement algorithms. The prototyping and simulation framework simplifies the tedious task of developing algorithms based on message passing, the communication model that most agreement algorithms are written for. In this framework, the same implementation can be reused for simulations and performance measurements on a real network. This characteristic greatly eases the task of validating simulation results with measurements (or vice versa). As for theoretical tools, we introduce two complexity metrics that predict performance with more accuracy than the traditional time and message complexity metrics. The key point is that our metrics take account for resource contention, both on the network and the hosts; resource contention is widely recognized as having a major impact on the performance of distributed algorithms. Extensive validation studies have been conducted. Currently, no widely accepted benchmarks exist for agreement algorithms or group communication toolkits, which makes comparing performance results from different sources difficult. In an attempt to consolidate the situation, we define a number of benchmarks for atomic broadcast. Our benchmarks include well-defined metrics, workloads and failure scenarios (faultloads). The use of the benchmarks is illustrated in two detailed case studies. Two widespread mechanisms for handling failures are unreliable failure detectors which provide inconsistent information about failures, and a group membership service which provides consistent information about failures, respectively. We analyze the performance tradeoffs of these two techniques, by comparing the performance of two atomic broadcast algorithms designed for an asynchronous system. Based on our results, we advocate a combined use of the two approaches to failure handling. In another case study, we compare two consensus algorithms designed for an asynchronous system. The two algorithms differ in how they coordinate the decision process: the one uses a centralized and the other a decentralized communication schema. Our results show that the performance tradeoffs are highly affected by a number of characteristics of the environment, like the availability of multicast and the amount of contention on the hosts versus the amount of contention on the network. Famous theoretical results state that a lot of important agreement problems are not solvable in the asynchronous system model. In our third case study, we investigate how these results are relevant for implementations of a replicated service, by conducting an experiment in a local area network. We exposed a replicated server to extremely high loads and required that the underlying failure detection service detects crashes very fast; the latter is important as the theoretical results are based on the impossibility of reliable failure detection. We found that our replicated server continued working even with the most extreme settings. We discuss the reasons for the robustness of our replicated server

Infoscience - École polytechnique fédérale de Lausanne

Evaluation of technologies of parallel computers' communication networks for a real-time triggering application in a high-energy physics experiment at CERN

Author: Hörtnagl C
Publication venue: Union of Concerned Scientists
Publication date: 01/01/1997
Field of study

CERN Document Server

Applications Development for the Computational Grid

Author: Abramson David
Publication venue: Monash University, InformationTechnology
Publication date: 01/01/2011
Field of study

University of Queensland eSpace

Advances in Grid Computing

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

This book approaches the grid computing with a perspective on the latest achievements in the field, providing an insight into the current research trends and advances, and presenting a large range of innovative research papers. The topics covered in this book include resource and data management, grid architectures and development, and grid-enabled applications. New ideas employing heuristic methods from swarm intelligence or genetic algorithm and quantum encryption are considered in order to explain two main aspects of grid computing: resource management and data management. The book addresses also some aspects of grid computing that regard architecture and development, and includes a diverse range of applications for grid computing, including possible human grid computing system, simulation of the fusion reaction, ubiquitous healthcare service provisioning and complex water systems

Directory of Open Access Books (DOAB)

NASA Tech Briefs, January 2000

Author
Publication venue
Publication date
Field of study

Topics include: Data Acquisition; Computer-Aided Design and Engineering; Electronic Components and Circuits; Electronic Systems; Bio-Medical; Physical Sciences; Materials; Computer Programs; Mechanics; Machinery/Automation; Information Sciences; Books and reports

NASA Technical Reports Server

CACIC 2015 : XXI Congreso Argentino de Ciencias de la Computación. Libro de actas

Author: Red de Universidades con Carreras en Informática
Publication venue: Universidad Nacional del Noroeste de la provincia de Buenos Aires (UNNOBA)
Publication date: 01/01/2015
Field of study

Actas del XXI Congreso Argentino de Ciencias de la Computación (CACIC 2015), realizado en Sede UNNOBA Junín, del 5 al 9 de octubre de 2015.Red de Universidades con Carreras en Informática (RedUNCI

Servicio de Difusión de la Creación Intelectual