Search CORE

37 research outputs found

Improving the scalability of parallel N-body applications with an event driven constraint based execution model

Author: Aarseth SJ
Alfieri RA
Bonachea D
Chandra R
Dekate C
El-Ghazawi T
Hewitt C
Kale L
Message Passing Interface Forum
O’Shea BW
Salmon JK
Singh JP
Publication venue: 'SAGE Publications'
Publication date: 23/09/2011
Field of study

The scalability and efficiency of graph applications are significantly constrained by conventional systems and their supporting programming models. Technology trends like multicore, manycore, and heterogeneous system architectures are introducing further challenges and possibilities for emerging application domains such as graph applications. This paper explores the space of effective parallel execution of ephemeral graphs that are dynamically generated using the Barnes-Hut algorithm to exemplify dynamic workloads. The workloads are expressed using the semantics of an Exascale computing execution model called ParalleX. For comparison, results using conventional execution model semantics are also presented. We find improved load balancing during runtime and automatic parallelism discovery improving efficiency using the advanced semantics for Exascale computing.Comment: 11 figure

arXiv.org e-Print Archive

Crossref

First-Class Functions for First-Order Database Engines

Author: Grust Torsten
Ulrich Alexander
Publication venue
Publication date: 01/08/2013
Field of study

We describe Query Defunctionalization which enables off-the-shelf first-order database engines to process queries over first-class functions. Support for first-class functions is characterized by the ability to treat functions like regular data items that can be constructed at query runtime, passed to or returned from other (higher-order) functions, assigned to variables, and stored in persistent data structures. Query defunctionalization is a non-invasive approach that transforms such function-centric queries into the data-centric operations implemented by common query processors. Experiments with XQuery and PL/SQL database systems demonstrate that first-order database engines can faithfully and efficiently support the expressive "functions as data" paradigm.Comment: Proceedings of the 14th International Symposium on Database Programming Languages (DBPL 2013), August 30, 2013, Riva del Garda, Trento, Ital

arXiv.org e-Print Archive

CiteSeerX

On One-Pass CPS Transformations

Author: Danvy Olivier
Nielsen Lasse R.
Publication venue: 'Aarhus University Library'
Publication date: 05/01/2002
Field of study

We bridge two distinct approaches to one-pass CPS transformations, i.e., CPS transformations that reduce administrative redexes at transformation time instead of in a post-processing phase. One approach is compositional and higher-order, and is due to Appel, Danvy and Filinski, and Wand, building on Plotkin's seminal work. The other is non-compositional and based on a syntactic theory of the lambda-calculus, and is due to Sabry and Felleisen. To relate the two approaches, we use Church encoding, Reynolds's defunctionalization, and an implementation technique for syntactic theories, refocusing, developed in the second author's PhD thesis

Crossref

Tidsskrift.dk (Det Kongelige Bibliotek)

Programs with continuations and linear logic

Author: Nishizaki Shin-ya
Publication venue: Published by Elsevier B.V.
Publication date: 31/10/1993
Field of study

AbstractThe purpose of this paper is to investigate the programs with continuations in the framework of classical linear logic which can also be regarded as improvement of traditional classical logic. First, simply typed λ-calculus λ→c with continuation primitives is introduced. Second, a translation from λ→c to linear logic is given. Last, a correspondence between them is presented

Elsevier - Publisher Connector

GraphStep: A System Architecture for Sparse-Graph Algorithms

Author: DeHon André
deLorimier Michael
Eslick Ian
Kapre Nachiket
Knight Thomas F., Jr.
Mehta Nikil
Rizzo Dominic
Rubin Raphael
Uribe Tomás E.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Many important applications are organized around long-lived, irregular sparse graphs (e.g., data and knowledge bases, CAD optimization, numerical problems, simulations). The graph structures are large, and the applications need regular access to a large, data-dependent portion of the graph for each operation (e.g., the algorithm may need to walk the graph, visiting all nodes, or propagate changes through many nodes in the graph). On conventional microprocessors, the graph structures exceed on-chip cache capacities, making main-memory bandwidth and latency the key performance limiters. To avoid this “memory wall,” we introduce a concurrent system architecture for sparse graph algorithms that places graph nodes in small distributed memories paired with specialized graph processing nodes interconnected by a lightweight network. This gives us a scalable way to map these applications so that they can exploit the high-bandwidth and low-latency capabilities of embedded memories (e.g., FPGA Block RAMs). On typical spreading activation queries on the ConceptNet Knowledge Base, a sample application, this translates into an order of magnitude speedup per FPGA compared to a state-of-the-art Pentium processor

CiteSeerX

Crossref

Caltech Authors

On One-Pass CPS Transformations

Author: Danvy Olivier
Millikin Kevin
Nielsen Lasse R.
Publication venue: 'Aarhus University Library'
Publication date: 12/03/2007
Field of study

We bridge two distinct approaches to one-pass CPS transformations, i.e., CPS transformations that reduce administrative redexes at transformation time instead of in a post-processing phase. One approach is compositional and higher-order, and is independently due to Appel, Danvy and Filinski, and Wand, building on Plotkin's seminal work. The other is non-compositional and based on a reduction semantics for the lambda-calculus, and is due to Sabry and Felleisen. To relate the two approaches, we use three tools: Reynolds's defunctionalization and its left inverse, refunctionalization; a special case of fold-unfold fusion due to Ohori and Sasano, fixed-point promotion; and an implementation technique for reduction semantics due to Danvy and Nielsen, refocusing. This work is directly applicable to transforming programs into monadic normal form

Tidsskrift.dk (Det Kongelige Bibliotek)

Recommended from our members

Stack-based Typed Assembly Language

Author: Crary Karl
Glew Neal
Morrisett John Gregory
Walker David
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 02/03/2010
Field of study

In previous work, we presented a Typed Assembly Language (TAL). TAL is sufficiently expressive to serve as a target language for compilers of high-level languages such as ML. This work assumed such a compiler would perform a continuation-passing style transform and eliminate the control stack by heap-allocating activation records. However, most compilers are based on stack allocation. This paper presents STAL, an extension of TAL with stack constructs and stack types to support the stack allocation style. We show that STAL is sufficiently expressive to support languages such as Java, Pascal, and ML; constructs such as exceptions and displays; and optimizations such as tail call elimination and callee-saves registers. This paper also formalizes the typing connection between CPS-based compilation and stack-based compilation and illustrates how STAL can formally model calling conventions by specifying them as formal translations of source function types to STAL types.Engineering and Applied Science

Harvard University - DASH

The Parallel Persistent Memory Model

Author: Berryhill R.
Blelloch G. E.
Buettner M.
Chauhan H.
Herlihy M.
JaJa J.
Lee S. K.
Meena J. S.
Nawab F.
Pelley S.
Woude J. Van Der
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 13/06/2018
Field of study

We consider a parallel computational model that consists of

P

processors, each with a fast local ephemeral memory of limited size, and sharing a large persistent memory. The model allows for each processor to fault with bounded probability, and possibly restart. On faulting all processor state and local ephemeral memory are lost, but the persistent memory remains. This model is motivated by upcoming non-volatile memories that are as fast as existing random access memory, are accessible at the granularity of cache lines, and have the capability of surviving power outages. It is further motivated by the observation that in large parallel systems, failure of processors and their caches is not unusual. Within the model we develop a framework for developing locality efficient parallel algorithms that are resilient to failures. There are several challenges, including the need to recover from failures, the desire to do this in an asynchronous setting (i.e., not blocking other processors when one fails), and the need for synchronization primitives that are robust to failures. We describe approaches to solve these challenges based on breaking computations into what we call capsules, which have certain properties, and developing a work-stealing scheduler that functions properly within the context of failures. The scheduler guarantees a time bound of

O(W/P_A + D(P/P_A) \lceil\log_{1/f} W\rceil)

in expectation, where

W

and

D

are the work and depth of the computation (in the absence of failures),

P_A

is the average number of processors available during the computation, and

f \le 1/2

is the probability that a capsule fails. Within the model and using the proposed methods, we develop efficient algorithms for parallel sorting and other primitives.Comment: This paper is the full version of a paper at SPAA 2018 with the same nam

arXiv.org e-Print Archive

Crossref

DSpace@MIT