Search CORE

16,567 research outputs found

Components and Interfaces of a Process Management System for Parallel Programs

Author: Butler Ralph
Gropp William
Lusk Ewing
Publication venue
Publication date: 01/01/2001
Field of study

Parallel jobs are different from sequential jobs and require a different type of process management. We present here a process management system for parallel programs such as those written using MPI. A primary goal of the system, which we call MPD (for multipurpose daemon), is to be scalable. By this we mean that startup of interactive parallel jobs comprising thousands of processes is quick, that signals can be quickly delivered to processes, and that stdin, stdout, and stderr are managed intuitively. Our primary target is parallel machines made up of clusters of SMPs, but the system is also useful in more tightly integrated environments. We describe how MPD enables much faster startup and better runtime management of parallel jobs. We show how close control of stdio can support the easy implementation of a number of convenient system utilities, even a parallel debugger. We describe a simple but general interface that can be used to separate any process manager from a parallel library, which we use to keep MPD separate from MPICH.Comment: 12 pages, Workshop on Clusters and Computational Grids for Scientific Computing, Sept. 24-27, 2000, Le Chateau de Faverges de la Tour, Franc

arXiv.org e-Print Archive

CiteSeerX

UNT Digital Library

Massively Parallel Computation Using Graphics Processors with Application to Optimal Experimentation in Dynamic Control

Author: Mathur Sudhanshu
Morozov Sergei
Publication venue
Publication date
Field of study

The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability has lead to its adoption in many non-graphics applications, including wide variety of scientific computing fields. At the same time, a number of important dynamic optimal policy problems in economics are athirst of computing power to help overcome dual curses of complexity and dimensionality. We investigate if computational economics may benefit from new tools on a case study of imperfect information dynamic programming problem with learning and experimentation trade-off that is, a choice between controlling the policy target and learning system parameters. Specifically, we use a model of active learning and control of linear autoregression with unknown slope that appeared in a variety of macroeconomic policy and other contexts. The endogeneity of posterior beliefs makes the problem difficult in that the value function need not be convex and policy function need not be continuous. This complication makes the problem a suitable target for massively-parallel computation using graphics processors. Our findings are cautiously optimistic in that new tools let us easily achieve a factor of 15 performance gain relative to an implementation targeting single-core processors and thus establish a better reference point on the computational speed vs. coding complexity trade-off frontier. While further gains and wider applicability may lie behind steep learning barrier, we argue that the future of many computations belong to parallel algorithms anyway.Graphics Processing Units, CUDA programming, Dynamic programming, Learning, Experimentation

Research Papers in Economics

RELEASE: A High-level Paradigm for Reliable Large-scale Server Software

Author: A. Leung
C. Hewitt
D. Dewolfs
D. Ungar
G. Agha
G. Germain
H. Rajan
J. Zhao
K. Sagonas
L. Seiler
M. Snir
R. Chandra
R.K. Karmani
S. Srinivasan
T. Arts
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Erlang is a functional language with a much-emulated model for building reliable distributed systems. This paper outlines the RELEASE project, and describes the progress in the first six months. The project aim is to scale the Erlang’s radical concurrency-oriented programming paradigm to build reliable general-purpose software, such as server-based systems, on massively parallel machines. Currently Erlang has inherently scalable computation and reliability models, but in practice scalability is constrained by aspects of the language and virtual machine. We are working at three levels to address these challenges: evolving the Erlang virtual machine so that it can work effectively on large scale multicore systems; evolving the language to Scalable Distributed (SD) Erlang; developing a scalable Erlang infrastructure to integrate multiple, heterogeneous clusters. We are also developing state of the art tools that allow programmers to understand the behaviour of massively parallel SD Erlang programs. We will demonstrate the effectiveness of the RELEASE approach using demonstrators and two large case studies on a Blue Gene

CiteSeerX

Crossref

Kent Academic Repository

RELEASE: A High-level Paradigm for Reliable Large-scale Server Software

Author: Chechina Natalia
Trinder Phil
Publication venue
Publication date: 01/01/2012
Field of study

Erlang is a functional language with a much-emulated model for building reliable distributed systems. This paper outlines the RELEASE project, and describes the progress in the rst six months. The project aim is to scale the Erlang's radical concurrency-oriented programming paradigm to build reliable general-purpose software, such as server-based systems, on massively parallel machines. Currently Erlang has inherently scalable computation and reliability models, but in practice scalability is constrained by aspects of the language and virtual machine. We are working at three levels to address these challenges: evolving the Erlang virtual machine so that it can work effectively on large scale multicore systems; evolving the language to Scalable Distributed (SD) Erlang; developing a scalable Erlang infrastructure to integrate multiple, heterogeneous clusters. We are also developing state of the art tools that allow programmers to understand the behaviour of massively parallel SD Erlang programs. We will demonstrate the e ectiveness of the RELEASE approach using demonstrators and two large case studies on a Blue Gene

Enlighten

Stream Fusion, to Completeness

Author: ACM
Biboudis A.
Biboudis A.
Jones S. Peyton
Kiselyov O.
Pouzet M.
Prokopec A.
Taha W.
Waters R. C.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/12/2016
Field of study

Stream processing is mainstream (again): Widely-used stream libraries are now available for virtually all modern OO and functional languages, from Java to C# to Scala to OCaml to Haskell. Yet expressivity and performance are still lacking. For instance, the popular, well-optimized Java 8 streams do not support the zip operator and are still an order of magnitude slower than hand-written loops. We present the first approach that represents the full generality of stream processing and eliminates overheads, via the use of staging. It is based on an unusually rich semantic model of stream interaction. We support any combination of zipping, nesting (or flat-mapping), sub-ranging, filtering, mapping-of finite or infinite streams. Our model captures idiosyncrasies that a programmer uses in optimizing stream pipelines, such as rate differences and the choice of a "for" vs. "while" loops. Our approach delivers hand-written-like code, but automatically. It explicitly avoids the reliance on black-box optimizers and sufficiently-smart compilers, offering highest, guaranteed and portable performance. Our approach relies on high-level concepts that are then readily mapped into an implementation. Accordingly, we have two distinct implementations: an OCaml stream library, staged via MetaOCaml, and a Scala library for the JVM, staged via LMS. In both cases, we derive libraries richer and simultaneously many tens of times faster than past work. We greatly exceed in performance the standard stream libraries available in Java, Scala and OCaml, including the well-optimized Java 8 streams

arXiv.org e-Print Archive

Crossref

Reconfigurable microarchitectures at the programmable logic interface

Author: Donlin Adam
Publication venue: The University of Edinburgh
Publication date: 01/01/2001
Field of study

Edinburgh Research Archive

Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from Concrete Concurrency Models

Author: A. Peymandoust
Alastair R. Beresford
Andreas Gal Albert Noll
Bram Adams
Bratin Saha
Carl Hewitt
Charles Antony Richard Hoare
Charles R. Johns
Chen-Yong Cher
Colin Blundell
David Ungar
David Wentzlaff
Doug Lea
ECMA International
Edward A. Lee
freescale semiconductor
Georg Sorst
Gul Agha
Hans Schippers
Haris Volos
Intel Corporation
James Gosling
Jim Gray
John A. Trono
John S. Danaher
John Zigman
Jos'e M. Piquer
Kevin Casey
Kevin Williams
Larry Seiler
Lukasz Ziarek
M. Anton Ertl
Mark S. Miller
Maurice Herlihy
Michael Haupt
Michael R. Marty
Nir Shavit
Pascal Costanza
Philipp Haller
Rajesh K. Karmani
Robert D. Blumofe
Robert Virding
Simon Gay
Sriram Srinivasan
Stefan Marr
Stefan Marr
Stijn Timbermont
Theo D'Hondt
Thomas Kistler
Tom Van Cutsem
Uwe Kastens
Vijay A. Saraswat
Virendra J. Marathe
Wenzhang Zhu
Wolfgang De Meuter
Xu Wang
Yaoqing Gao
Publication venue: 'Open Publishing Association'
Publication date: 01/02/2010
Field of study

The upcoming many-core architectures require software developers to exploit concurrency to utilize available computational power. Today's high-level language virtual machines (VMs), which are a cornerstone of software development, do not provide sufficient abstraction for concurrency concepts. We analyze concrete and abstract concurrency models and identify the challenges they impose for VMs. To provide sufficient concurrency support in VMs, we propose to integrate concurrency operations into VM instruction sets. Since there will always be VMs optimized for special purposes, our goal is to develop a methodology to design instruction sets with concurrency support. Therefore, we also propose a list of trade-offs that have to be investigated to advise the design of such instruction sets. As a first experiment, we implemented one instruction set extension for shared memory and one for non-shared memory concurrency. From our experimental results, we derived a list of requirements for a full-grown experimental environment for further research

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Kent Academic Repository