445 research outputs found

    The exploitation of parallelism on shared memory multiprocessors

    Get PDF
    PhD ThesisWith the arrival of many general purpose shared memory multiple processor (multiprocessor) computers into the commercial arena during the mid-1980's, a rift has opened between the raw processing power offered by the emerging hardware and the relative inability of its operating software to effectively deliver this power to potential users. This rift stems from the fact that, currently, no computational model with the capability to elegantly express parallel activity is mature enough to be universally accepted, and used as the basis for programming languages to exploit the parallelism that multiprocessors offer. To add to this, there is a lack of software tools to assist programmers in the processes of designing and debugging parallel programs. Although much research has been done in the field of programming languages, no undisputed candidate for the most appropriate language for programming shared memory multiprocessors has yet been found. This thesis examines why this state of affairs has arisen and proposes programming language constructs, together with a programming methodology and environment, to close the ever widening hardware to software gap. The novel programming constructs described in this thesis are intended for use in imperative languages even though they make use of the synchronisation inherent in the dataflow model by using the semantics of single assignment when operating on shared data, so giving rise to the term shared values. As there are several distinct parallel programming paradigms, matching flavours of shared value are developed to permit the concise expression of these paradigms.The Science and Engineering Research Council

    Simulation models of shared-memory multiprocessor systems

    Get PDF

    Dataflow development of medium-grained parallel software

    Get PDF
    PhD ThesisIn the 1980s, multiple-processor computers (multiprocessors) based on conven- tional processing elements emerged as a popular solution to the continuing demand for ever-greater computing power. These machines offer a general-purpose parallel processing platform on which the size of program units which can be efficiently executed in parallel - the "grain size" - is smaller than that offered by distributed computing environments, though greater than that of some more specialised architectures. However, programming to exploit this medium-grained parallelism remains difficult. Concurrent execution is inherently complex, yet there is a lack of programming tools to support parallel programming activities such as program design, implementation, debugging, performance tuning and so on. In helping to manage complexity in sequential programming, visual tools have often been used to great effect, which suggests one approach towards the goal of making parallel programming less difficult. This thesis examines the possibilities which the dataflow paradigm has to offer as the basis for a set of visual parallel programming tools, and presents a dataflow notation designed as a framework for medium-grained parallel programming. The implementation of this notation as a programming language is discussed, and its suitability for the medium-grained level is examinedScience and Engineering Research Council of Great Britain EC ERASMUS schem

    Execution replay and debugging

    Full text link
    As most parallel and distributed programs are internally non-deterministic -- consecutive runs with the same input might result in a different program flow -- vanilla cyclic debugging techniques as such are useless. In order to use cyclic debugging tools, we need a tool that records information about an execution so that it can be replayed for debugging. Because recording information interferes with the execution, we must limit the amount of information and keep the processing of the information fast. This paper contains a survey of existing execution replay techniques and tools.Comment: In M. Ducasse (ed), proceedings of the Fourth International Workshop on Automated Debugging (AADebug 2000), August 2000, Munich. cs.SE/001003

    Quantitative performance evaluation of SCI memory hierarchies

    Get PDF

    Some aspects of the efficient use of multiprocessor control systems

    Get PDF
    Computer technology, particularly at the circuit level, is fast approaching its physical limitations. As future needs for greater power from computing systems grows, increases in circuit switching speed (and thus instruction speed) will be unable to match these requirements. Greater power can also be obtained by incorporating several processing units into a single system. This ability to increase the performance of a system by the addition of processing units is one of the major advantages of multiprocessor systems. Four major characteristics of multiprocessor systems have been identified (28) which demonstrate their advantage. These are:- Throughput Flexibility Availability Reliability The additional throughput obtained from a multiprocessor has been mentioned above.. This increase in the power of the system can be obtained in a modular fashion with extra processors being added as greater processing needs arise. The addition of extra processors also has (in general) the desirable advantage of giving a smoother cost - performance curve ( 63). Flexibility is obtained from the increased ability to construct a system matching the user 'requirements at a given time without placing restrictions upon future expansion. With multiprocessor systems; the potential also exists of making greater use of the resources within the system. Availability and reliability are inter-related. Increased availability is achieved, in a well designed system, by ensuring that processing capabilities can be provided to the user even if one (or more) of the processing units has failed. The service provided, however, will probably be degraded due to the reduction in processing capacity. Increased reliability is obtained by the ability of the processing units to compensate for the failure of one of their number. This recovery may involve complex software checks and a consequent decrease in available power even when all the units are functioning

    Mixing multi-core CPUs and GPUs for scientific simulation software

    Get PDF
    Recent technological and economic developments have led to widespread availability of multi-core CPUs and specialist accelerator processors such as graphical processing units (GPUs). The accelerated computational performance possible from these devices can be very high for some applications paradigms. Software languages and systems such as NVIDIA's CUDA and Khronos consortium's open compute language (OpenCL) support a number of individual parallel application programming paradigms. To scale up the performance of some complex systems simulations, a hybrid of multi-core CPUs for coarse-grained parallelism and very many core GPUs for data parallelism is necessary. We describe our use of hybrid applica- tions using threading approaches and multi-core CPUs to control independent GPU devices. We present speed-up data and discuss multi-threading software issues for the applications level programmer and o er some suggested areas for language development and integration between coarse-grained and ne-grained multi-thread systems. We discuss results from three common simulation algorithmic areas including: partial di erential equations; graph cluster metric calculations and random number generation. We report on programming experiences and selected performance for these algorithms on: single and multiple GPUs; multi-core CPUs; a CellBE; and using OpenCL. We discuss programmer usability issues and the outlook and trends in multi-core programming for scienti c applications developers

    Digital signal conditioning on multiprocessor systems

    Get PDF
    An important application area of modem computer systems is that of digital signal processing. This discipline is concerned with the analysis or modification of digitally represented signals, through the use of simple mathematical operations. A primary need of such systems is that of high data throughput. Although optimised programmable processors are available, system designers are now looking towards parallel processing to gain further performance increases. Such parallel systems may be easily constructed using the transputer family of processors. However, although these devices are comparatively easy to program, they possess a general von Neumann core and so are relatively inefficient at implementing digital signal processing algorithms. The power of the transputer lies in its ability to communicate effectively, not in its computational capability. The converse is true of specialised digital signal processors. These devices have been designed specifically to implement the type of small data intensive operations required by digital signal processing algorithms, but have not been designed to operate efficiently in a multiprocessor environment. This thesis examines the performance of both types of processors with reference to a common signal processing application, multichannel filtering. The transputer is examined in both uniprocessor and multiprocessor configurations, and its performance analysed. A theoretical model of program behaviour is developed, in order to assess the performance benefits of particular code structures and the effects of such parameters as data block size. The transputer implementation is contrasted with that of the Motorola DSP56001 digital signal processor. This device is found to be much more efficient at implementing such algorithms on a single device, but provides limited multiprocessor support. Using the conclusions of this assessment, a hybrid multiprocessor has been designed. This consists of a transputer controlling a number of signal processors, communicating through shared memory, separating tiie tasks of computation and communication. Forcing the transputer to communicate through shared memory causes problems, and these have been addressed. A theoretical performance model of the system has been produced. A small system has been constructed, and is currently running performance test software
    corecore