Search CORE

2,080 research outputs found

Shared versus distributed memory multiprocessors

Author: Jordan Harry F.
Publication venue
Publication date
Field of study

The question of whether multiprocessors should have shared or distributed memory has attracted a great deal of attention. Some researchers argue strongly for building distributed memory machines, while others argue just as strongly for programming shared memory multiprocessors. A great deal of research is underway on both types of parallel systems. Special emphasis is placed on systems with a very large number of processors for computation intensive tasks and considers research and implementation trends. It appears that the two types of systems will likely converge to a common form for large scale multiprocessors

NASA Technical Reports Server

MPF: A portable message passing facility for shared memory multiprocessors

Author: Malony Allen D.
Mcguire Patrick J.
Reed Daniel A.
Publication venue
Publication date
Field of study

The design, implementation, and performance evaluation of a message passing facility (MPF) for shared memory multiprocessors are presented. The MPF is based on a message passing model conceptually similar to conversations. Participants (parallel processors) can enter or leave a conversation at any time. The message passing primitives for this model are implemented as a portable library of C function calls. The MPF is currently operational on a Sequent Balance 21000, and several parallel applications were developed and tested. Several simple benchmark programs are presented to establish interprocess communication performance for common patterns of interprocess communication. Finally, performance figures are presented for two parallel applications, linear systems solution, and iterative solution of partial differential equations

NASA Technical Reports Server

An O(1) time complexity software barrier

Author: Carter John B.
Cheng Liqun
Publication venue: University of Utah
Publication date: 01/01/2004
Field of study

technical reportAs network latency rapidly approaches thousands of processor cycles and multiprocessors systems become larger and larger, the primary factor in determining a barrier algorithm?s performance is the number of serialized network latencies it requires. All existing barrier algorithms require at least O(log N) round trip message latencies to perform a single barrier operation on an N-node shared memory multiprocessor. In addition, existing barrier algorithms are not well tuned in terms of how they interact with modern shared memory systems, which leads to an excessive number of message exchanges to signal barrier completion. The contributions of this paper are threefold. First, we identify and quantitatively analyze the performance deficiencies of conventional barrier implementations when they are executed on real (non-idealized) hardware. Second, we propose a queue-based barrier algorithm that has effectively O(1)time complexity as measured in round trip message latencies. Third, by exploiting a hardware write-update (PUT) mechanism for signaling, we demonstrate how carefully matching the barrier implementation to the way that modern shared memory systems operate can improve performance dramatically. The resulting optimized algorithm only costs one round trip message latency to perform a barrier operation across N processors. Using a cycle-accurate execution-driven simulator of a future-generation SGI multiprocessor, we show that the proposed queue-based barrier outperforms conventional barrier implementations based on load-linked/storeconditional instructions by a factor of 5.43 (on 4 processors) to 93.96 (on 256 processors)

The University of Utah: J. Willard Marriott Digital Library

Memory performance of and-parallel prolog on shared-memory architectures

Author: Hermenegildo Manuel V.
Tick Evan
Publication venue: Facultad de Informática (UPM)
Publication date: 01/08/1988
Field of study

The goal of the RAP-WAM AND-parallel Prolog abstract architecture is to provide inference speeds significantly beyond those of sequential systems, while supporting Prolog semantics and preserving sequential performance and storage efficiency. This paper presents simulation results supporting these claims with special emphasis on memory performance on a two-level sharedmemory multiprocessor organization. Several solutions to the cache coherency problem are analyzed. It is shown that RAP-WAM offers good locality and storage efficiency and that it can effectively take advantage of broadcast caches. It is argued that speeds in excess of 2 ML IPS on real applications exhibiting medium parallelism can be attained with current technology

Archivo Digital UPM

The Impact of Parallel Processing on Operating Systems

Author: Felician ALECU
Publication venue
Publication date
Field of study

The base entity in computer programming is the process or task. The parallelism can be achieved by executing multiple processes on different processors. Distributed systems are managed by distributed operating systems that represent the extension for multiprocessor architectures of multitasking and multiprogramming operating systems.

Research Papers in Economics

An implementation of the flexible spin-lock model in ERIKA Enterprise on a multi-core platform

Author: Afshar S.
Behnam M.
Bril R.J.
Gai P.
Verwielen M.P.W.
Publication venue
Publication date: 01/01/2016
Field of study

Recently, the flexible spin-lock model (FSLM) has been introduced, unifying spin-based and suspension-based resource sharing protocols for real-time multiprocessor platforms by explicitly identifying the spin-lock priority as a parameter.Earlier work focused on the definition of a protocol for FSLM and its corresponding analysis under the assumption that various types of implementation overhead could be ignored.In this paper, we briefly describe an implementation of the FSLM for a selected range of spin-lock priorities in the ERIKA Enterprise RTOS as instantiated on an Altera Nios II platform using 4 soft-core processors. Moreover, we present measurement results for the protocol specific overhead of FSLM as well as thenatively provided multiprocessor stack resource policy (MSRP).Given these results, we are now in a position to judge when it is advantageous to use either MSRP or FMLP for our system set-up for given global resource access times of tasks

Pure OAI Repository

An implementation of the flexible spin-lock model in ERIKA Enterprise on a multi-core platform

Author: Afshar S.
Behnam M.
Bril R.J.
Gai P.
Verwielen M.P.W.
Publication venue
Publication date: 01/01/2016
Field of study

Pure OAI Repository