1,488 research outputs found
Designing Practical Efficient Algorithms for Symmetric Multiprocessors
Symmetric multiprocessors (SMPs) dominate the high-end server market and are
currently the primary candidate for constructing large scale multiprocessor
systems. Yet, the design of efficient parallel algorithms for this platform
currently poses several challenges. In this paper, we present a computational
model for designing efficient algorithms for symmetric multiprocessors. We
then use this model to create efficient solutions to two widely different
types of problems - linked list prefix computations and generalized sorting.
Our novel algorithm for prefix computations builds upon the sparse ruling set
approach of Reid-Miller and Blelloch. Besides being somewhat simpler and
requiring nearly half the number of memory accesses, we can bound our
complexity with high probability instead of merely on average. Our algorithm
for generalized sorting is a modification of our algorithm for sorting by
regular sampling on distributed memory architectures. The algorithm is a
stable sort which appears to be asymptotically faster than any of the
published algorithms for SMPs. Both of our algorithms were implemented in C
using POSIX threads and run on three symmetric multiprocessors - the DEC
AlphaServer, the Silicon Graphics Power Challenge, and the HP-Convex Exemplar.
We ran our code for each algorithm using a variety of benchmarks which we
identified to examine the dependence of our algorithm on memory access
patterns. In spite of the fact that the processors must compete for access
to main memory, both algorithms still yielded scalable performance up to 16
processors, which was the largest platform available to us. For some
problems, our prefix computation algorithm actually matched or exceeded the
performance of the best sequential solution using only a single thread.
Similarly, our generalized sorting algorithm always beat the performance of
sequential merge sort by at least an order of magnitude, even with a single
thread. (Also cross-referenced as UMIACS-TR-98-44
A bibliography on parallel and vector numerical algorithms
This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also
Concurrent Access Algorithms for Different Data Structures: A Research Review
Algorithms for concurrent data structure have gained attention in recent years as multi-core processors have become ubiquitous. Several features of shared-memory multiprocessors make concurrent data structures significantly more difficult to design and to verify as correct than their sequential counterparts. The primary source of this additional difficulty is concurrency. This paper provides an overview of the some concurrent access algorithms for different data structures
Applications and accuracy of the parallel diagonal dominant algorithm
The Parallel Diagonal Dominant (PDD) algorithm is a highly efficient, ideally scalable tridiagonal solver. In this paper, a detailed study of the PDD algorithm is given. First the PDD algorithm is introduced. Then the algorithm is extended to solve periodic tridiagonal systems. A variant, the reduced PDD algorithm, is also proposed. Accuracy analysis is provided for a class of tridiagonal systems, the symmetric, and anti-symmetric Toeplitz tridiagonal systems. Implementation results show that the analysis gives a good bound on the relative error, and the algorithm is a good candidate for the emerging massively parallel machines
The exploitation of parallelism on shared memory multiprocessors
PhD ThesisWith the arrival of many general purpose shared memory multiple processor
(multiprocessor) computers into the commercial arena during the mid-1980's, a
rift has opened between the raw processing power offered by the emerging
hardware and the relative inability of its operating software to effectively deliver
this power to potential users. This rift stems from the fact that, currently, no
computational model with the capability to elegantly express parallel activity is
mature enough to be universally accepted, and used as the basis for programming
languages to exploit the parallelism that multiprocessors offer. To add to this,
there is a lack of software tools to assist programmers in the processes of designing
and debugging parallel programs.
Although much research has been done in the field of programming languages,
no undisputed candidate for the most appropriate language for programming
shared memory multiprocessors has yet been found. This thesis examines why this
state of affairs has arisen and proposes programming language constructs,
together with a programming methodology and environment, to close the ever
widening hardware to software gap.
The novel programming constructs described in this thesis are intended for use
in imperative languages even though they make use of the synchronisation
inherent in the dataflow model by using the semantics of single assignment when
operating on shared data, so giving rise to the term shared values. As there are
several distinct parallel programming paradigms, matching flavours of shared
value are developed to permit the concise expression of these paradigms.The Science and Engineering Research Council
Computational methods and software systems for dynamics and control of large space structures
Two key areas of crucial importance to the computer-based simulation of large space structures are discussed. The first area involves multibody dynamics (MBD) of flexible space structures, with applications directed to deployment, construction, and maneuvering. The second area deals with advanced software systems, with emphasis on parallel processing. The latest research thrust in the second area involves massively parallel computers
Analysis and implementation of the multiprocessor bandwidth inheritance protocol
The Multiprocessor Bandwidth Inheritance (M-BWI) protocol is an extension of the Bandwidth Inheritance (BWI) protocol for symmetric multiprocessor systems. Similar to Priority Inheritance, M-BWI lets a task that has locked a resource execute in the resource reservations of the blocked tasks, thus reducing their blocking time. The protocol is particularly suitable for open systems where different kinds of tasks dynamically arrive and leave, because it guarantees temporal isolation among independent subsets of tasks without requiring any information on their temporal parameters. Additionally, if the temporal parameters of the interacting tasks are known, it is possible to compute an upper bound to the interference suffered by a task due to other interacting tasks. Thus, it is possible to provide timing guarantees for a subset of interacting hard real-time tasks. Finally, the M-BWI protocol is neutral to the underlying scheduling policy: it can be implemented in global, clustered and semi-partitioned scheduling.
After introducing the M-BWI protocol, in this paper we formally prove its isolation properties, and propose an algorithm to compute an upper bound to the interference suffered by a task. Then, we describe our implementation of the protocol for the LITMUS RT real-time testbed, and measure its overhead. Finally, we compare M-BWI against FMLP and OMLP, two other protocols for resource sharing in multiprocessor systems
Discrete-Time Chaotic-Map Truly Random Number Generators: Design, Implementation, and Variability Analysis of the Zigzag Map
In this paper, we introduce a novel discrete chaotic map named zigzag map
that demonstrates excellent chaotic behaviors and can be utilized in Truly
Random Number Generators (TRNGs). We comprehensively investigate the map and
explore its critical chaotic characteristics and parameters. We further present
two circuit implementations for the zigzag map based on the switched current
technique as well as the current-mode affine interpolation of the breakpoints.
In practice, implementation variations can deteriorate the quality of the
output sequence as a result of variation of the chaotic map parameters. In
order to quantify the impact of variations on the map performance, we model the
variations using a combination of theoretical analysis and Monte-Carlo
simulations on the circuits. We demonstrate that even in the presence of the
map variations, a TRNG based on the zigzag map passes all of the NIST 800-22
statistical randomness tests using simple post processing of the output data.Comment: To appear in Analog Integrated Circuits and Signal Processing (ALOG
Solution of partial differential equations on vector and parallel computers
The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed
- …