Search CORE

209 research outputs found

Recommended from our members

Executing matrix multiply on a process oriented data flow machine

Author: Bic Lubomir
Nagel Mark D.
Roy John M.A.
Publication venue: eScholarship, University of California
Publication date: 01/01/1990
Field of study

The Process-Oriented Dataflow System (PODS) is an execution model that combines the von Neumann and dataflow models of computation to gain the benefits of each. Central to PODS is the concept of array distribution and its effects on partitioning and mapping of processes.In PODS arrays are partitioned by simply assigning consecutive elements to each processing element (PE) equally. Since PODS uses single assignment, there will be only one producer of each element. This producing PE owns that element and will perform the necessary computations to assign it. Using this approach the filling loop is distributed across the PEs. This simple partitioning and mapping scheme provides excellent results for executing scientific code on MIMD machines. In this way PODS allows MIMD machines to exploit vector and data parallelism easily while still providing the flexibility of MIMD over SIMD for multi-user systems.In this paper, the classic matrix multiply algorithm, with 1024 data points, is executed on a PODS simulator and the results are presented and discussed. Matrix multiply is a good example because it has several interesting properties: there are multiple code-blocks; a new array must be dynamically allocated and distributed; there is a loop-carried dependency in the innermost loop; the two input arrays have different access patterns; and the sizes of the input arrays are not known at compile time. Matrix multiply also forms the basis for many important scientific algorithms such as: LU decomposition, convolution, and the Fast-Fourier Transform.The results show that PODS is comparable to both Iannucci's Hybrid Architecture and MIT's TTDA in terms of overhead and instruction power. They also show that PODS easily distributes the work load evenly across the PEs. The key result is that PODS can scale matrix multiply in a near linear fashion until there is little or no work to be performed for each PE. Then overhead and message passing become a major component of the execution time. With larger problems (e.g., >/=16k data points) this limit would be reached at around 256 PEs

eScholarship - University of California

CASCH: a tool for computer-aided scheduling

Author: Ahmad I
Kwok YK
Shu W
Wu MY
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2000
Field of study

A software tool called Computer-Aided Scheduling (CASCH) for parallel processing on distributed-memory multiprocessors in a complete parallel programming environment is presented. A compiler automatically converts sequential applications into parallel codes to perform program parallelization. The parallel code that executes on a target machine is optimized by CASCH through proper scheduling and mapping.published_or_final_versio

HKU Scholars Hub

Dynamic Systolization for Developing Multiprocessor Supercomputers

Author: Hwang Kai
Xu Zhiwei
Publication venue: 'Purdue University (bepress)'
Publication date: 01/10/1984
Field of study

A dynamic network approach is introduced for developing reconfigurable, systolic arrays or wavefront processors; This allows one to design very powerful and flexible processors to be used in a general-purpose, reconfigurable, and fault-tolerant, multiprocessor computer system. The concepts of macro-dataflow and multitasking can be integrated to handle variable-resolution granularities in computationally intensive algorithms. A multiprocessor architecture, Remps, is proposed based on these design methodologies. The Remps architecture is generalized from the Cedar, HEP, Cray X- MP, Trac, NYU ultracomputer, S-l, Pumps, Chip, and SAM projects. Our goal is to provide a multiprocessor research model for developing design methodologies, multiprocessing and multitasking supports, dynamic systolic/wavefront array processors, interconnection networks, reconfiguration techniques, and performance analysis tools. These system design and operational techniques should be useful to those who are developing or evaluating multiprocessor supercomputers

Purdue E-Pubs

Structure driven multiprocessor compilation of numeric problems

Author: Prasanna Gorur Narayana Srinivasa
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1991
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1991.Title as it appears in the Feb. 1991 M.I.T. Graduate List: Structure driven compilation of numeric problems.Includes bibliographical references (leaves 134-136).by G.N. Srinivasa Prasanna.Ph.D

DSpace@MIT

Dataflow development of medium-grained parallel software

Author: Harley Jonathan William
Publication venue: Newcastle University
Publication date: 01/01/1993
Field of study

PhD ThesisIn the 1980s, multiple-processor computers (multiprocessors) based on conven- tional processing elements emerged as a popular solution to the continuing demand for ever-greater computing power. These machines offer a general-purpose parallel processing platform on which the size of program units which can be efficiently executed in parallel - the "grain size" - is smaller than that offered by distributed computing environments, though greater than that of some more specialised architectures. However, programming to exploit this medium-grained parallelism remains difficult. Concurrent execution is inherently complex, yet there is a lack of programming tools to support parallel programming activities such as program design, implementation, debugging, performance tuning and so on. In helping to manage complexity in sequential programming, visual tools have often been used to great effect, which suggests one approach towards the goal of making parallel programming less difficult. This thesis examines the possibilities which the dataflow paradigm has to offer as the basis for a set of visual parallel programming tools, and presents a dataflow notation designed as a framework for medium-grained parallel programming. The implementation of this notation as a programming language is discussed, and its suitability for the medium-grained level is examinedScience and Engineering Research Council of Great Britain EC ERASMUS schem

Newcastle University eTheses

Recommended from our members

Program allocation for hypercube based dataflow systems

Author: Freytag Vincent R.
Publication venue: 'Oregon State University'
Publication date
Field of study

The dataflow model of computation differs from the traditional control-flow model of computation in that it does not utilize a program counter to sequence instructions in a program. Instead, the execution of instructions is based solely on the availability of their operands. Thus, an instruction is executed in a dataflow computer when all of its operands are available. This asynchronous nature of the dataflow model of computation allows the exploitation of fine-grain parallelism inherent in programs. Although the dataflow model of computation exploits parallelism, the problem of optimally allocating a program to processors belongs to the class of NP-complete problems. Therefore, one of the major issues facing designers of dataflow multiprocessors is the proper allocation of programs to processors. The problem of program allocation lies in maximizing parallelism while minimizing interprocessor communication costs. The culmination of research in the area of program allocation has produced the proposed method called the Balanced Layered Allocation Scheme that utilizes heuristic rules to strike a balance between computation time and communication costs in dataflow multiprocessors. Specifically, the proposed allocation scheme utilizes Critical Path and Longest Directed Path heuristics when allocating instructions to processors. Simulation studies indicate that the proposed scheme is effective in reducing the overall execution time of a program by considering the effects of communication costs on computation times

ScholarsArchive@OSU

Static Scheduling Strategies for Heterogeneous Systems

Author: Beaumont Olivier
Legrand Arnaud
Robert Yves
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 21/02/2012
Field of study

In this paper, we consider static scheduling techniques for heterogeneous systems, such as clusters and grids. We successively deal with minimum makespan scheduling, divisible load scheduling and steady-state scheduling. Finally, we discuss the limitations of static scheduling approaches

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Chapter 4 DATAFLOW ANALYSIS FOR REAL-TIME EMBEDDED MULTIPROCESSOR SYSTEM DESIGN

Author: Bart Mesman
Er Stuijk
Jan David Mol
Jef Van Meerbergen
Marco Bekooij
O Moreira
Peter Poplavko
Rob Hoes
Publication venue
Publication date
Field of study

Keywords: Dataflow analysis techniques are key to reduce the number of design iterations and shorten the design time of real-time embedded network based multiprocessor systems that process data streams. With these analysis techniques the worstcase end-to-end temporal behavior of hard real-time applications can be derived from a dataflow model in which computation, communication and arbitration is modeled. For soft real-time applications these static dataflow analysis techniques are combined with simulation of the dataflow model to test statistical assertions about their temporal behavior. The simulation results in combination with properties of the dataflow model are used to derive the sensitivity of design parameters and to estimate parameters like the capacity of data buffers. real-time, dataflow analysis, multiprocessor system, predictable design, systemon-chip 1

CiteSeerX

The exploitation of parallelism on shared memory multiprocessors

Author: Stoker Michael Allan
Publication venue: Newcastle University
Publication date: 01/01/1990
Field of study

PhD ThesisWith the arrival of many general purpose shared memory multiple processor (multiprocessor) computers into the commercial arena during the mid-1980's, a rift has opened between the raw processing power offered by the emerging hardware and the relative inability of its operating software to effectively deliver this power to potential users. This rift stems from the fact that, currently, no computational model with the capability to elegantly express parallel activity is mature enough to be universally accepted, and used as the basis for programming languages to exploit the parallelism that multiprocessors offer. To add to this, there is a lack of software tools to assist programmers in the processes of designing and debugging parallel programs. Although much research has been done in the field of programming languages, no undisputed candidate for the most appropriate language for programming shared memory multiprocessors has yet been found. This thesis examines why this state of affairs has arisen and proposes programming language constructs, together with a programming methodology and environment, to close the ever widening hardware to software gap. The novel programming constructs described in this thesis are intended for use in imperative languages even though they make use of the synchronisation inherent in the dataflow model by using the semantics of single assignment when operating on shared data, so giving rise to the term shared values. As there are several distinct parallel programming paradigms, matching flavours of shared value are developed to permit the concise expression of these paradigms.The Science and Engineering Research Council

Newcastle University eTheses