16,307 research outputs found
Devito: Towards a generic Finite Difference DSL using Symbolic Python
Domain specific languages (DSL) have been used in a variety of fields to
express complex scientific problems in a concise manner and provide automated
performance optimization for a range of computational architectures. As such
DSLs provide a powerful mechanism to speed up scientific Python computation
that goes beyond traditional vectorization and pre-compilation approaches,
while allowing domain scientists to build applications within the comforts of
the Python software ecosystem. In this paper we present Devito, a new finite
difference DSL that provides optimized stencil computation from high-level
problem specifications based on symbolic Python expressions. We demonstrate
Devito's symbolic API and performance advantages over traditional Python
acceleration methods before highlighting its use in the scientific context of
seismic inversion problems.Comment: pyHPC 2016 conference submissio
Group Communication Patterns for High Performance Computing in Scala
We developed a Functional object-oriented Parallel framework (FooPar) for
high-level high-performance computing in Scala. Central to this framework are
Distributed Memory Parallel Data structures (DPDs), i.e., collections of data
distributed in a shared nothing system together with parallel operations on
these data. In this paper, we first present FooPar's architecture and the idea
of DPDs and group communications. Then, we show how DPDs can be implemented
elegantly and efficiently in Scala based on the Traversable/Builder pattern,
unifying Functional and Object-Oriented Programming. We prove the correctness
and safety of one communication algorithm and show how specification testing
(via ScalaCheck) can be used to bridge the gap between proof and
implementation. Furthermore, we show that the group communication operations of
FooPar outperform those of the MPJ Express open source MPI-bindings for Java,
both asymptotically and empirically. FooPar has already been shown to be
capable of achieving close-to-optimal performance for dense matrix-matrix
multiplication via JNI. In this article, we present results on a parallel
implementation of the Floyd-Warshall algorithm in FooPar, achieving more than
94 % efficiency compared to the serial version on a cluster using 100 cores for
matrices of dimension 38000 x 38000
Recommended from our members
'BioNessie(G) - a grid enabled biochemical networks simulation environment
The simulation of biochemical networks provides insight and
understanding about the underlying biochemical processes and pathways
used by cells and organisms. BioNessie is a biochemical network simulator
which has been developed at the University of Glasgow. This paper
describes the simulator and focuses in particular on how it has been
extended to benefit from a wide variety of high performance compute resources
across the UK through Grid technologies to support larger scale
simulations
BioNessie - a grid enabled biochemical networks simulation environment
The simulation of biochemical networks provides insight and understanding about the underlying biochemical processes and pathways used by cells and organisms. BioNessie is a biochemical network simulator which has been developed at the University of Glasgow. This paper describes the simulator and focuses in particular on how it has been extended to benefit from a wide variety of high performance compute resources across the UK through Grid technologies to support larger scale simulations
cphVB: A System for Automated Runtime Optimization and Parallelization of Vectorized Applications
Modern processor architectures, in addition to having still more cores, also
require still more consideration to memory-layout in order to run at full
capacity. The usefulness of most languages is deprecating as their
abstractions, structures or objects are hard to map onto modern processor
architectures efficiently.
The work in this paper introduces a new abstract machine framework, cphVB,
that enables vector oriented high-level programming languages to map onto a
broad range of architectures efficiently. The idea is to close the gap between
high-level languages and hardware optimized low-level implementations. By
translating high-level vector operations into an intermediate vector bytecode,
cphVB enables specialized vector engines to efficiently execute the vector
operations.
The primary success parameters are to maintain a complete abstraction from
low-level details and to provide efficient code execution across different,
modern, processors. We evaluate the presented design through a setup that
targets multi-core CPU architectures. We evaluate the performance of the
implementation using Python implementations of well-known algorithms: a jacobi
solver, a kNN search, a shallow water simulation and a synthetic stencil
simulation. All demonstrate good performance
Generic access to symbolic computing services
Symbolic computation is one of the computational domains that requires large computational
resources. Computer Algebra Systems (CAS), the main tools used for symbolic
computations, are mainly designed to be used as software tools installed on standalone
machines that do not provide the required resources for solving large symbolic computation
problems. In order to support symbolic computations an infrastructure built upon
massively distributed computational environments must be developed.
Building an infrastructure for symbolic computations requires a thorough analysis of
the most important requirements raised by the symbolic computation world and must
be built based on the most suitable architectural styles and technologies. The architecture
that we propose is composed of several main components: the Computer Algebra
System (CAS) Server that exposes the functionality implemented by one or more supporting
CASs through generic interfaces of Grid Services; the Architecture for Grid
Symbolic Services Orchestration (AGSSO) Server that allows seamless composition of
CAS Server capabilities; and client side libraries to assist the users in describing workflows
for symbolic computations directly within the CAS environment. We have also
designed and developed a framework for automatic data management of mathematical
content that relies on OpenMath encoding.
To support the validation and fine tuning of the system we have developed a simulation
platform that mimics the environment on which the architecture is deployed
- …