Search CORE

11,073 research outputs found

An assumed partition algorithm for determining processor inter-communication

Author: A.H. Baker
Baker
Falgout
Falgout
Falgout
Matocha
Pinar
R.D. Falgout
Robert
Thakur
U.M. Yang
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Recommended from our members

Synthesis from specifications : basic concepts

Author: Gajski Daniel D.
Narayan Sanjiv
Vahid Frank
Publication venue: eScholarship, University of California
Publication date: 29/01/1990
Field of study

The need has evolved for a synthesis tool at the computer system level. SpecSyn is one such tool. Basically, it will view the world as a set of chips communicating via protocols. Thus, an abstract specification would get synthesized into a set of one or more interconnected chips. From that point, detail is added to each chip's specification until its structure is synthesized or it is determined that a prefabricated chip similar in functionality can be used.Features of such a tool include executable specifications from which to synthesize, constraint driven partitioning of the specifications into components (chips) and synthesis of interfaces between them, translation into VHDL and synthesis into VHDL structures of micro-architectural components, and the use of other tools (e.g. MILO, a micro-architecture and logic optimizer, and LES, a layout expert system) to evaluate the quality of the chip layout generated from VHDL description.A major component of SpecSyn is SpecCharts, a high level specification language amenable to system level synthesis, able to represent designs from system to register transfer levels. The language consists of a hierarchy of states, represented in combined graphical and textual form, at the same time catering to the expression of concurrent behavior and specification of constraints. With it we have specified several Intel chips as well as higher level systems, and have found it to be quite powerful and easy to use.SpecSyn will have a graphical interface, from which the user can at any time view or edit a SpecChart, translate to VHDL and simulate, view statistics provided by estimators (such as area, speed, and pins), store and retrieve SpecCharts, apply basic Spec Chart operations, as well as apply the partitioning algorithms or interface synthesizer. Providing access to a wide range of tools, having a single language represent the design throughout the synthesis process, and having user specified constraints allow the user to have varying amounts of control over the synthesis process

eScholarship - University of California

Optimal dynamic remapping of parallel computations

Author: Nicol David M.
Reynolds Paul F., Jr.
Publication venue
Publication date
Field of study

A large class of computations are characterized by a sequence of phases, with phase changes occurring unpredictably. The decision problem was considered regarding the remapping of workload to processors in a parallel computation when the utility of remapping and the future behavior of the workload is uncertain, and phases exhibit stable execution requirements during a given phase, but requirements may change radically between phases. For these problems a workload assignment generated for one phase may hinder performance during the next phase. This problem is treated formally for a probabilistic model of computation with at most two phases. The fundamental problem of balancing the expected remapping performance gain against the delay cost was addressed. Stochastic dynamic programming is used to show that the remapping decision policy minimizing the expected running time of the computation has an extremely simple structure. Because the gain may not be predictable, the performance of a heuristic policy that does not require estimnation of the gain is examined. The heuristic method's feasibility is demonstrated by its use on an adaptive fluid dynamics code on a multiprocessor. The results suggest that except in extreme cases, the remapping decision problem is essentially that of dynamically determining whether gain can be achieved by remapping after a phase change. The results also suggest that this heuristic is applicable to computations with more than two phases

NASA Technical Reports Server

The effect of real workloads and stochastic workloads on the performance of allocation and scheduling algorithms in 2D mesh multicomputers

Author: Abaneh I.
Bani-Mohammad S.
Ferguson J.D.
Mackenzie L.M.
Ould-Khaoua M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2008
Field of study

The performance of the existing non-contiguous processor allocation strategies has been traditionally carried out by means of simulation based on a stochastic workload model to generate a stream of incoming jobs. To validate the performance of the existing algorithms, there has been a need to evaluate the algorithms' performance based on a real workload trace. In this paper, we evaluate the performance of several well-known processor allocation and job scheduling strategies based on a real workload trace and compare the results against those obtained from using a stochastic workload. Our results reveal that the conclusions reached on the relative performance merits of the allocation strategies when a real workload trace is used are in general compatible with those obtained when a stochastic workload is used

Crossref

Enlighten

Stencils and problem partitionings: Their influence on the performance of multiple processor systems

Author: Adams L. M.
Patrick M. L.
Reed D. A.
Publication venue
Publication date
Field of study

Given a discretization stencil, partitioning the problem domain is an important first step for the efficient solution of partial differential equations on multiple processor systems. Partitions are derived that minimize interprocessor communication when the number of processors is known a priori and each domain partition is assigned to a different processor. This partitioning technique uses the stencil structure to select appropriate partition shapes. For square problem domains, it is shown that non-standard partitions (e.g., hexagons) are frequently preferable to the standard square partitions for a variety of commonly used stencils. This investigation is concluded with a formalization of the relationship between partition shape, stencil structure, and architecture, allowing selection of optimal partitions for a variety of parallel systems

NASA Technical Reports Server

A survey of parallel execution strategies for transitive closure and logic programs

Author: Cacace F.
Ceri S.
Houtsma M.A.W.
Publication venue: Kluwer Academic Publishers
Publication date: 01/01/1993
Field of study

An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribute data to disks under the control of different processors in order to perform selections and joins in parallel. With the development of new query languages, and in particular with the definition of transitive closure queries and of more general logic programming queries, the new dimension of recursion has been added to query processing. Recursive queries are complex; at the same time, their regular structure is particularly suited for parallel execution, and parallelism may give a high efficiency gain. We survey the approaches to parallel execution of recursive queries that have been presented in the recent literature. We observe that research on parallel execution of recursive queries is separated into two distinct subareas, one focused on the transitive closure of Relational Algebra expressions, the other one focused on optimization of more general Datalog queries. Though the subareas seem radically different because of the approach and formalism used, they have many common features. This is not surprising, because most typical Datalog queries can be solved by means of the transitive closure of simple algebraic expressions. We first analyze the relationship between the transitive closure of expressions in Relational Algebra and Datalog programs. We then review sequential methods for evaluating transitive closure, distinguishing iterative and direct methods. We address the parallelization of these methods, by discussing various forms of parallelization. Data fragmentation plays an important role in obtaining parallel execution; we describe hash-based and semantic fragmentation. Finally, we consider Datalog queries, and present general methods for parallel rule execution; we recognize the similarities between these methods and the methods reviewed previously, when the former are applied to linear Datalog queries. We also provide a quantitative analysis that shows the impact of the initial data distribution on the performance of methods

CiteSeerX

University of Twente Research Information

Implementation of a parallel unstructured Euler solver on shared and distributed memory architectures

Author: Das Raja
Mavriplis D. J.
Saltz Joel
Vermeland R. E.
Publication venue
Publication date
Field of study

An efficient three dimensional unstructured Euler solver is parallelized on a Cray Y-MP C90 shared memory computer and on an Intel Touchstone Delta distributed memory computer. This paper relates the experiences gained and describes the software tools and hardware used in this study. Performance comparisons between two differing architectures are made

NASA Technical Reports Server

Automated problem scheduling and reduction of synchronization delay effects

Author: Saltz Joel H.
Publication venue
Publication date
Field of study

It is anticipated that in order to make effective use of many future high performance architectures, programs will have to exhibit at least a medium grained parallelism. A framework is presented for partitioning very sparse triangular systems of linear equations that is designed to produce favorable preformance results in a wide variety of parallel architectures. Efficient methods for solving these systems are of interest because: (1) they provide a useful model problem for use in exploring heuristics for the aggregation, mapping and scheduling of relatively fine grained computations whose data dependencies are specified by directed acrylic graphs, and (2) because such efficient methods can find direct application in the development of parallel algorithms for scientific computation. Simple expressions are derived that describe how to schedule computational work with varying degrees of granularity. The Encore Multimax was used as a hardware simulator to investigate the performance effects of using the partitioning techniques presented in shared memory architectures with varying relative synchronization costs

NASA Technical Reports Server