Search CORE

1,647 research outputs found

Parallel Computers and Complex Systems

Author: G.C. Fox
P.D. Coddington
Publication venue: University Press
Publication date
Field of study

We present an overview of the state of the art and future trends in high performance parallel and distributed computing, and discuss techniques for using such computers in the simulation of complex problems in computational science. The use of high performance parallel computers can help improve our understanding of complex systems, and the converse is also true --- we can apply techniques used for the study of complex systems to improve our understanding of parallel computing. We consider parallel computing as the mapping of one complex system --- typically a model of the world --- into another complex system --- the parallel computer. We study static, dynamic, spatial and temporal properties of both the complex systems and the map between them. The result is a better understanding of which computer architectures are good for which problems, and of software structure, automatic partitioning of data, and the performance of parallel machines

CiteSeerX

Distributed Logic Objects: A Fragment of Rewriting Logic and its Implementation

Author: Ciampolini Anna
Lamma Evelina
Mello Paola
Stefanelli Cesare
Publication venue
Publication date: 01/01/1996
Field of study

Abstract This paper presents a logic language (called Distributed Logic Objects, DLO for short) that supports objects, messages and inheritance. The operational semantics of the language is given in terms of rewriting rules acting upon the (possibly distributed) state of the system. In this sense, the logic underlying the language is Rewriting Logic. In the paper we discuss the implementation of this language on distributed memory MIMD architectures, and we describe the advantages achieved in terms of flexibility, scalability and load balancing. In more detail, the implementation is obtained by translating logic objects into a concurrent logic language based on multi-head clauses, taking advantage from its distributed implementation on a massively parallel architecture. In the underlying implementation, objects are clusters of processes, objects' state is represented by logical variables, message-passing communication between objects is performed via multi-head clauses, and inheritance is mapped into clause union. Some interesting features such as transparent object migration and intensional messages are easily achieved thanks to the underlying support. In the paper, we also sketch a (direct) distributed implementation supporting the indexing of clauses for single-named methods

Elsevier - Publisher Connector

Archivio istituzionale della ricerca - Università di Ferrara

Open Access Repository

Rewriting History: Repurposing Domain-Specific CGRAs

Author: Ainsworth Sam
Brauckmann Alexander
Cummins Chris
Koehler Thomas
O'Boyle Michael F. P.
Woodruff Jackson
Publication venue
Publication date: 16/09/2023
Field of study

Coarse-grained reconfigurable arrays (CGRAs) are domain-specific devices promising both the flexibility of FPGAs and the performance of ASICs. However, with restricted domains comes a danger: designing chips that cannot accelerate enough current and future software to justify the hardware cost. We introduce FlexC, the first flexible CGRA compiler, which allows CGRAs to be adapted to operations they do not natively support. FlexC uses dataflow rewriting, replacing unsupported regions of code with equivalent operations that are supported by the CGRA. We use equality saturation, a technique enabling efficient exploration of a large space of rewrite rules, to effectively search through the program-space for supported programs. We applied FlexC to over 2,000 loop kernels, compiling to four different research CGRAs and 300 generated CGRAs and demonstrate a 2.2

\times

increase in the number of loop kernels accelerated leading to 3

\times

speedup compared to an Arm A5 CPU on kernels that would otherwise be unsupported by the accelerator

arXiv.org e-Print Archive

Programming distributed memory architectures using Kali

Author: Mehrotra Piyush
Vanrosendale John
Publication venue
Publication date
Field of study

Programming nonshared memory systems is more difficult than programming shared memory systems, in part because of the relatively low level of current programming environments for such machines. A new programming environment is presented, Kali, which provides a global name space and allows direct access to remote data values. In order to retain efficiency, Kali provides a system on annotations, allowing the user to control those aspects of the program critical to performance, such as data distribution and load balancing. The primitives and constructs provided by the language is described, and some of the issues raised in translating a Kali program for execution on distributed memory systems are also discussed

NASA Technical Reports Server

Many-Task Computing and Blue Waters

Author: Armstrong Timothy G.
Katz Daniel S.
Wilde Michael
Wozniak Justin M.
Zhang Zhao
Publication venue
Publication date: 01/01/2012
Field of study

This report discusses many-task computing (MTC) generically and in the context of the proposed Blue Waters systems, which is planned to be the largest NSF-funded supercomputer when it begins production use in 2012. The aim of this report is to inform the BW project about MTC, including understanding aspects of MTC applications that can be used to characterize the domain and understanding the implications of these aspects to middleware and policies. Many MTC applications do not neatly fit the stereotypes of high-performance computing (HPC) or high-throughput computing (HTC) applications. Like HTC applications, by definition MTC applications are structured as graphs of discrete tasks, with explicit input and output dependencies forming the graph edges. However, MTC applications have significant features that distinguish them from typical HTC applications. In particular, different engineering constraints for hardware and software must be met in order to support these applications. HTC applications have traditionally run on platforms such as grids and clusters, through either workflow systems or parallel programming systems. MTC applications, in contrast, will often demand a short time to solution, may be communication intensive or data intensive, and may comprise very short tasks. Therefore, hardware and software for MTC must be engineered to support the additional communication and I/O and must minimize task dispatch overheads. The hardware of large-scale HPC systems, with its high degree of parallelism and support for intensive communication, is well suited for MTC applications. However, HPC systems often lack a dynamic resource-provisioning feature, are not ideal for task communication via the file system, and have an I/O system that is not optimized for MTC-style applications. Hence, additional software support is likely to be required to gain full benefit from the HPC hardware

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Logic, parallelism and semantic networks : the binary predicate execution model

Author: Lee Craig Alexander
Publication venue: eScholarship, University of California
Publication date: 01/01/1988
Field of study

This thesis develops the Binary Predicate Execution Model; a distributed, massively-parallel system for semantic networks and knowledge bases that is built on a subset of first-order predicate logic. The use of logic gives the model an easily-understood programming paradigm and a well-defined semantics of execution. When expressed in binary predicates, a simple graphical interpretation can be used. All program facts are represented in an assertion graph. Each vertex is associated with a term appearing in a fact and the edges are labeled with the predicate names. Similar graphs are also associated with each rule body and the query. Finding all possible solutions corresponds to finding all possible matches between the query graph and the assertion graph. Invoking a rule corresponds to substituting the graph of its body constrained by the dependencies between its arguments. This can be implemented in a parallel, message-passing fashion where the assertion graph vertices are active processing elements which asynchronously exchange messages identifying different parts of the query that remain to be matched and containing any binding information from previous matching required to accomplish this. The model is data-driven since every message can be immediately processed without the need for any centralized control or centralized memory. By restricting how functional terms can occur, distributed data structures and remote data look-ups for unification are eliminated. Thus, the model's performance on increasingly larger problems scales-up given increasingly larger machines in most cases. Architectural support for the model is investigated and simulation results of a relatively simple software implementation are reported. This suggests performance on the order of 10^5 logical inferences per second for 256 processing elements in an n-cube configuration. Further research directions, including that of increasing efficiency, are discussed

eScholarship - University of California

A common distributed language approach to software integration

Author: Antonelli Charles J.
Mudge Trevor N.
Volz Richard A.
Publication venue
Publication date
Field of study

An important objective in software integration is the development of techniques to allow programs written in different languages to function together. Several approaches are discussed toward achieving this objective and the Common Distributed Language Approach is presented as the approach of choice

NASA Technical Reports Server

A survey of parallel execution strategies for transitive closure and logic programs

Author: Cacace F.
Ceri S.
Houtsma M.A.W.
Publication venue: Kluwer Academic Publishers
Publication date: 01/01/1993
Field of study

An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribute data to disks under the control of different processors in order to perform selections and joins in parallel. With the development of new query languages, and in particular with the definition of transitive closure queries and of more general logic programming queries, the new dimension of recursion has been added to query processing. Recursive queries are complex; at the same time, their regular structure is particularly suited for parallel execution, and parallelism may give a high efficiency gain. We survey the approaches to parallel execution of recursive queries that have been presented in the recent literature. We observe that research on parallel execution of recursive queries is separated into two distinct subareas, one focused on the transitive closure of Relational Algebra expressions, the other one focused on optimization of more general Datalog queries. Though the subareas seem radically different because of the approach and formalism used, they have many common features. This is not surprising, because most typical Datalog queries can be solved by means of the transitive closure of simple algebraic expressions. We first analyze the relationship between the transitive closure of expressions in Relational Algebra and Datalog programs. We then review sequential methods for evaluating transitive closure, distinguishing iterative and direct methods. We address the parallelization of these methods, by discussing various forms of parallelization. Data fragmentation plays an important role in obtaining parallel execution; we describe hash-based and semantic fragmentation. Finally, we consider Datalog queries, and present general methods for parallel rule execution; we recognize the similarities between these methods and the methods reviewed previously, when the former are applied to linear Datalog queries. We also provide a quantitative analysis that shows the impact of the initial data distribution on the performance of methods

CiteSeerX

University of Twente Research Information

An implementation of static functional process networks

Author: AI Limited
H. Glaser
J. Hughes
W.W. Wadge
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref