1,557 research outputs found
ILP-based approaches to partitioning recurrent workloads upon heterogeneous multiprocessors
The problem of partitioning systems of independent constrained-deadline sporadic tasks upon heterogeneous multiprocessor platforms is considered. Several different integer linear program (ILP) formulations of this problem, offering different tradeoffs between effectiveness (as quantified by speedup bound) and running time efficiency, are presented
Performance Guarantees of Local Search for Multiprocessor Scheduling
Increasing interest has recently been shown in analyzing the worst-case behavior of local search algorithms. In particular, the quality of local optima and the time needed to find the local optima by the simplest form of local search has been studied. This paper deals with worst-case performance of local search algorithms for makespan minimization on parallel machines. We analyze the quality of the local optima obtained by iterative improvement over the jump, swap, multi-exchange, and the newly defined push neighborhoods. Finally, for the jump neighborhood we provide bounds on the number of local search steps required to find a local optimum.operations research and management science;
A bibliography on parallel and vector numerical algorithms
This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also
A parallel implementation of a multisensor feature-based range-estimation method
There are many proposed vision based methods to perform obstacle detection and avoidance for autonomous or semi-autonomous vehicles. All methods, however, will require very high processing rates to achieve real time performance. A system capable of supporting autonomous helicopter navigation will need to extract obstacle information from imagery at rates varying from ten frames per second to thirty or more frames per second depending on the vehicle speed. Such a system will need to sustain billions of operations per second. To reach such high processing rates using current technology, a parallel implementation of the obstacle detection/ranging method is required. This paper describes an efficient and flexible parallel implementation of a multisensor feature-based range-estimation algorithm, targeted for helicopter flight, realized on both a distributed-memory and shared-memory parallel computer
Design and analysis of numerical algorithms for the solution of linear systems on parallel and distributed architectures
The increasing availability of parallel computers is having a very significant impact on
all aspects of scientific computation, including algorithm research and software
development in numerical linear algebra. In particular, the solution of linear systems,
which lies at the heart of most calculations in scientific computing is an important
computation found in many engineering and scientific applications.
In this thesis, well-known parallel algorithms for the solution of linear systems are
compared with implicit parallel algorithms or the Quadrant Interlocking (QI) class of
algorithms to solve linear systems. These implicit algorithms are (2x2) block
algorithms expressed in explicit point form notation. [Continues.
Solution of partial differential equations on vector and parallel computers
The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed
Parallel implementation of the finite element method on shared memory multiprocessors
PhD ThesisThe work presented in this thesis concerns parallel methods for finite element
analysis. The research has been funded by British Gas and some of the presented
material involves work on their software. Practical problems involving the finite
element method can use a large amount of processing power and the execution
times can be very large. It is consequently important to investigate the possibilities
for the parallel implementation of the method. The research has been carried out
on an Encore Multimax, a shared memory multiprocessor with 14 identical CPU's.
We firstly experimented on autoparallelising a large British Gas finite element
program (GASP4) using Encore's parallelising Fortran compiler (epf). The par-
allel program generated by epj proved not to be efficient. The main reasons are
the complexity of the code and small grain parallelism. Since the program is hard
to analyse for the compiler at high levels, only small grain parallelism has been
inserted automatically into the code. This involves a great deal of low level syn-
chronisations which produce large overheads and cause inefficiency. A detailed
analysis of the autoparallelised code has been made with a view to determining
the reasons for the inefficiency. Suggestions have also been made about writing
programs such that they are suitable for efficient autoparallelisation.
The finite element method consists of the assembly of a stiffness matrix and
the solution of a set of simultaneous linear equations. A sparse representation of
the stiffness matrix has been used to allow experimentation on large problems.
Parallel assembly techniques for the sparse representation have been developed.
Some of these methods have proved to be very efficient giving speed ups that are
near ideal.
For the solution phase, we have used the preconditioned conjugate gradient
method (PCG). An incomplete LU factorization ofthe stiffness matrix with no fill-
in (ILU(O)) has been found to be an effective preconditioner. The factors can be
obtained at a low cost. We have parallelised all the steps of the PCG method. The
main bottleneck is the triangular solves (preconditioning operations) at each step.
Two parallel methods of triangular solution have been implemented. One is based
on level scheduling (row-oriented parallelism) and the other is a new approach
called independent columns (column-oriented parallelism). The algorithms have
been tested for row and red-black orderings of the nodal unknowns in the finite
element meshes considered.
The best speed ups obtained are 7.29 (on 12 processors) for level scheduling
and 7.11 (on 12 processors) for independent columns. Red-black ordering gives
rise to better parallel performance than row ordering in general. An analysis of
methods for the improvement of the parallel efficiency has been made.British Ga
Principles for problem aggregation and assignment in medium scale multiprocessors
One of the most important issues in parallel processing is the mapping of workload to processors. This paper considers a large class of problems having a high degree of potential fine grained parallelism, and execution requirements that are either not predictable, or are too costly to predict. The main issues in mapping such a problem onto medium scale multiprocessors are those of aggregation and assignment. We study a method of parameterized aggregation that makes few assumptions about the workload. The mapping of aggregate units of work onto processors is uniform, and exploits locality of workload intensity to balance the unknown workload. In general, a finer aggregate granularity leads to a better balance at the price of increased communication/synchronization costs; the aggregation parameters can be adjusted to find a reasonable granularity. The effectiveness of this scheme is demonstrated on three model problems: an adaptive one-dimensional fluid dynamics problem with message passing, a sparse triangular linear system solver on both a shared memory and a message-passing machine, and a two-dimensional time-driven battlefield simulation employing message passing. Using the model problems, the tradeoffs are studied between balanced workload and the communication/synchronization costs. Finally, an analytical model is used to explain why the method balances workload and minimizes the variance in system behavior
System configuration and executive requirements specifications for reusable shuttle and space station/base
System configuration and executive requirements specifications for reusable shuttle and space station/bas
- …