445 research outputs found
The exploitation of parallelism on shared memory multiprocessors
PhD ThesisWith the arrival of many general purpose shared memory multiple processor
(multiprocessor) computers into the commercial arena during the mid-1980's, a
rift has opened between the raw processing power offered by the emerging
hardware and the relative inability of its operating software to effectively deliver
this power to potential users. This rift stems from the fact that, currently, no
computational model with the capability to elegantly express parallel activity is
mature enough to be universally accepted, and used as the basis for programming
languages to exploit the parallelism that multiprocessors offer. To add to this,
there is a lack of software tools to assist programmers in the processes of designing
and debugging parallel programs.
Although much research has been done in the field of programming languages,
no undisputed candidate for the most appropriate language for programming
shared memory multiprocessors has yet been found. This thesis examines why this
state of affairs has arisen and proposes programming language constructs,
together with a programming methodology and environment, to close the ever
widening hardware to software gap.
The novel programming constructs described in this thesis are intended for use
in imperative languages even though they make use of the synchronisation
inherent in the dataflow model by using the semantics of single assignment when
operating on shared data, so giving rise to the term shared values. As there are
several distinct parallel programming paradigms, matching flavours of shared
value are developed to permit the concise expression of these paradigms.The Science and Engineering Research Council
Dataflow development of medium-grained parallel software
PhD ThesisIn the 1980s, multiple-processor computers (multiprocessors) based on conven-
tional processing elements emerged as a popular solution to the continuing demand
for ever-greater computing power. These machines offer a general-purpose parallel
processing platform on which the size of program units which can be efficiently
executed in parallel - the "grain size" - is smaller than that offered by distributed
computing environments, though greater than that of some more specialised
architectures. However, programming to exploit this medium-grained parallelism
remains difficult. Concurrent execution is inherently complex, yet there is a lack of
programming tools to support parallel programming activities such as program
design, implementation, debugging, performance tuning and so on.
In helping to manage complexity in sequential programming, visual tools have
often been used to great effect, which suggests one approach towards the goal of
making parallel programming less difficult.
This thesis examines the possibilities which the dataflow paradigm has to offer
as the basis for a set of visual parallel programming tools, and presents a dataflow
notation designed as a framework for medium-grained parallel programming. The
implementation of this notation as a programming language is discussed, and its
suitability for the medium-grained level is examinedScience and Engineering Research Council of Great Britain
EC ERASMUS schem
Execution replay and debugging
As most parallel and distributed programs are internally non-deterministic --
consecutive runs with the same input might result in a different program flow
-- vanilla cyclic debugging techniques as such are useless. In order to use
cyclic debugging tools, we need a tool that records information about an
execution so that it can be replayed for debugging. Because recording
information interferes with the execution, we must limit the amount of
information and keep the processing of the information fast. This paper
contains a survey of existing execution replay techniques and tools.Comment: In M. Ducasse (ed), proceedings of the Fourth International Workshop
on Automated Debugging (AADebug 2000), August 2000, Munich. cs.SE/001003
Some aspects of the efficient use of multiprocessor control systems
Computer technology, particularly at the circuit level, is fast
approaching its physical limitations. As future needs for greater
power from computing systems grows, increases in circuit switching
speed (and thus instruction speed) will be unable to match these
requirements.
Greater power can also be obtained by incorporating several processing
units into a single system. This ability to increase the performance
of a system by the addition of processing units is one of the major
advantages of multiprocessor systems. Four major characteristics of
multiprocessor systems have been identified (28) which demonstrate
their advantage. These are:-
Throughput
Flexibility
Availability
Reliability
The additional throughput obtained from a multiprocessor has been
mentioned above.. This increase in the power of the system can be
obtained in a modular fashion with extra processors being added as
greater processing needs arise. The addition of extra processors
also has (in general) the desirable advantage of giving a smoother
cost - performance curve ( 63). Flexibility is obtained from the
increased ability to construct a system matching the user 'requirements
at a given time without placing restrictions upon future expansion.
With multiprocessor systems; the potential also exists of making
greater use of the resources within the system.
Availability and reliability are inter-related. Increased availability
is achieved, in a well designed system, by ensuring that processing
capabilities can be provided to the user even if one (or more) of the
processing units has failed. The service provided, however, will
probably be degraded due to the reduction in processing capacity.
Increased reliability is obtained by the ability of the processing
units to compensate for the failure of one of their number. This
recovery may involve complex software checks and a consequent decrease
in available power even when all the units are functioning
Mixing multi-core CPUs and GPUs for scientific simulation software
Recent technological and economic developments have led to widespread availability of
multi-core CPUs and specialist accelerator processors such as graphical processing units
(GPUs). The accelerated computational performance possible from these devices can be very
high for some applications paradigms. Software languages and systems such as NVIDIA's
CUDA and Khronos consortium's open compute language (OpenCL) support a number of
individual parallel application programming paradigms. To scale up the performance of some
complex systems simulations, a hybrid of multi-core CPUs for coarse-grained parallelism and
very many core GPUs for data parallelism is necessary. We describe our use of hybrid applica-
tions using threading approaches and multi-core CPUs to control independent GPU devices.
We present speed-up data and discuss multi-threading software issues for the applications
level programmer and o er some suggested areas for language development and integration
between coarse-grained and ne-grained multi-thread systems. We discuss results from three
common simulation algorithmic areas including: partial di erential equations; graph cluster
metric calculations and random number generation. We report on programming experiences
and selected performance for these algorithms on: single and multiple GPUs; multi-core CPUs;
a CellBE; and using OpenCL. We discuss programmer usability issues and the outlook and
trends in multi-core programming for scienti c applications developers
Digital signal conditioning on multiprocessor systems
An important application area of modem computer systems is that of digital signal processing. This discipline is concerned with the analysis or modification of digitally represented signals, through the use of simple mathematical operations. A primary need of such systems is that of high data throughput. Although optimised programmable processors are available, system designers are now looking towards parallel processing to gain further performance increases. Such parallel systems may be easily constructed using the transputer family of processors. However, although these devices are comparatively easy to program, they possess a general von Neumann core and so are relatively inefficient at implementing digital signal processing algorithms. The power of the transputer lies in its ability to communicate effectively, not in its computational capability. The converse is true of specialised digital signal processors. These devices have been designed specifically to implement the type of small data intensive operations required by digital signal processing algorithms, but have not been designed to operate efficiently in a multiprocessor environment. This thesis examines the performance of both types of processors with reference to a common signal processing application, multichannel filtering. The transputer is examined in both uniprocessor and multiprocessor configurations, and its performance analysed. A theoretical model of program behaviour is developed, in order to assess the performance benefits of particular code structures and the effects of such parameters as data block size. The transputer implementation is contrasted with that of the Motorola DSP56001 digital signal processor. This device is found to be much more efficient at implementing such algorithms on a single device, but provides limited multiprocessor support. Using the conclusions of this assessment, a hybrid multiprocessor has been designed. This consists of a transputer controlling a number of signal processors, communicating through shared memory, separating tiie tasks of computation and communication. Forcing the transputer to communicate through shared memory causes problems, and these have been addressed. A theoretical performance model of the system has been produced. A small system has been constructed, and is currently running performance test software
- …