923 research outputs found
Java Grande Forum Report: Making Java Work for High-End Computing
This document describes the Java Grande Forum and includes its initial deliverables.Theseare reports that convey a succinct set of recommendations from this forum to SunMicrosystems and other purveyors of Java™ technology that will enable GrandeApplications to be developed with the Java programming language
Scaling finite difference methods in large eddy simulation of jet engine noise to the petascale: numerical methods and their efficient and automated implementation
Reduction of jet engine noise has recently become a new arena of competition between aircraft manufacturers. As a relatively new field of research in computational fluid dynamics (CFD), computational aeroacoustics (CAA) prediction of jet engine noise based on large eddy simulation (LES) is a robust and accurate tool that complements the existing theoretical and experimental approaches. In order to satisfy the stringent requirements of CAA on numerical accuracy, finite difference methods in LES-based jet engine noise prediction rely on the implicitly formulated compact spatial partial differentiation and spatial filtering schemes, a crucial component of which is an embedded solver for tridiagonal linear systems spatially oriented along the three coordinate directions of the computational space. Traditionally, researchers and engineers in CAA have employed manually crafted implementations of solvers including the transposition method, the multiblock method and the Schur complement method. Algorithmically, these solvers force a trade-off between numerical accuracy and parallel scalability. Programmingwise, implementing them for each of the three coordinate directions is tediously repetitive and error-prone. ^ In this study, we attempt to tackle both of these two challenges faced by researchers and engineers. We first describe an accurate and scalable tridiagonal linear system solver as a specialization of the truncated SPIKE algorithm and strategies for efficient implementation of the compact spatial partial differentiation and spatial filtering schemes. We then elaborate on two programming models tailored for composing regular grid-based numerical applications including finite difference-based LES of jet engine noise, one based on generalized elemental subroutines and the other based on functional array programming, and the accompanying code optimization and generation methodologies. Through empirical experiments, we demonstrate that truncated SPIKE-based spatial partial differentiation and spatial filtering deliver the theoretically promised optimal scalability in weak scaling conditions and can be implemented using the two programming models with performance on par with handwritten code while significantly reducing the required programming effort
Adaptivity in High-Performance Embedded Systems: a Reactive Control Model for Reliable and Flexible Design
International audienceSystem adaptivity is increasingly demanded in high-performance embedded systems, particularly in multimedia System-on-Chip (SoC), due to growing Quality of Service requirements. This paper presents a reactive control model that has been introduced in Gaspard, our framework dedicated to SoC hardware/software co-design. This model aims at expressing adaptivity as well as reconfigurability in systems performing data-intensive computations. It is generic enough to be used for description in the different parts of an embedded system, e.g. specification of how different data-intensive algorithms can be chosen according to some computation modes at the functional level; expression of how hardware components can be selected via the usage of a library of Intellectual Properties (IPs) according to execution performances. The transformation of this model towards synchronous languages is also presented, in order to allow an automatic code generation usable for formal verification, based of techniques such as model checking and controller synthesis as illustrated in the paper. This work, based on Model-Driven Engineering and the standard UML MARTE profile, has been implemented in Gaspard
The exploitation of parallelism on shared memory multiprocessors
PhD ThesisWith the arrival of many general purpose shared memory multiple processor
(multiprocessor) computers into the commercial arena during the mid-1980's, a
rift has opened between the raw processing power offered by the emerging
hardware and the relative inability of its operating software to effectively deliver
this power to potential users. This rift stems from the fact that, currently, no
computational model with the capability to elegantly express parallel activity is
mature enough to be universally accepted, and used as the basis for programming
languages to exploit the parallelism that multiprocessors offer. To add to this,
there is a lack of software tools to assist programmers in the processes of designing
and debugging parallel programs.
Although much research has been done in the field of programming languages,
no undisputed candidate for the most appropriate language for programming
shared memory multiprocessors has yet been found. This thesis examines why this
state of affairs has arisen and proposes programming language constructs,
together with a programming methodology and environment, to close the ever
widening hardware to software gap.
The novel programming constructs described in this thesis are intended for use
in imperative languages even though they make use of the synchronisation
inherent in the dataflow model by using the semantics of single assignment when
operating on shared data, so giving rise to the term shared values. As there are
several distinct parallel programming paradigms, matching flavours of shared
value are developed to permit the concise expression of these paradigms.The Science and Engineering Research Council
An Introduction to Programming for Bioscientists: A Python-based Primer
Computing has revolutionized the biological sciences over the past several
decades, such that virtually all contemporary research in the biosciences
utilizes computer programs. The computational advances have come on many
fronts, spurred by fundamental developments in hardware, software, and
algorithms. These advances have influenced, and even engendered, a phenomenal
array of bioscience fields, including molecular evolution and bioinformatics;
genome-, proteome-, transcriptome- and metabolome-wide experimental studies;
structural genomics; and atomistic simulations of cellular-scale molecular
assemblies as large as ribosomes and intact viruses. In short, much of
post-genomic biology is increasingly becoming a form of computational biology.
The ability to design and write computer programs is among the most
indispensable skills that a modern researcher can cultivate. Python has become
a popular programming language in the biosciences, largely because (i) its
straightforward semantics and clean syntax make it a readily accessible first
language; (ii) it is expressive and well-suited to object-oriented programming,
as well as other modern paradigms; and (iii) the many available libraries and
third-party toolkits extend the functionality of the core language into
virtually every biological domain (sequence and structure analyses,
phylogenomics, workflow management systems, etc.). This primer offers a basic
introduction to coding, via Python, and it includes concrete examples and
exercises to illustrate the language's usage and capabilities; the main text
culminates with a final project in structural bioinformatics. A suite of
Supplemental Chapters is also provided. Starting with basic concepts, such as
that of a 'variable', the Chapters methodically advance the reader to the point
of writing a graphical user interface to compute the Hamming distance between
two DNA sequences.Comment: 65 pages total, including 45 pages text, 3 figures, 4 tables,
numerous exercises, and 19 pages of Supporting Information; currently in
press at PLOS Computational Biolog
Programming Abstractions for Data Locality
The goal of the workshop and this report is to identify common themes and standardize concepts for locality-preserving abstractions for exascale programming models. Current software tools are built on the premise that computing is the most expensive component, we are rapidly moving to an era that computing is cheap and massively parallel while data movement dominates energy and performance costs. In order to respond to exascale systems (the next generation of high performance computing systems), the scientific computing community needs to refactor their applications to align with the emerging data-centric paradigm. Our applications must be evolved to express information about data locality. Unfortunately current programming environments offer few ways to do so. They ignore the incurred cost of communication and simply rely on the hardware cache coherency to virtualize data movement. With the increasing importance of task-level parallelism on future systems, task models have to support constructs that express data locality and affinity. At the system level, communication libraries implicitly assume all the processing elements are equidistant to each other. In order to take advantage of emerging technologies, application developers need a set of programming abstractions to describe data locality for the new computing ecosystem. The new programming paradigm should be more data centric and allow to describe how to decompose and how to layout data in the memory.Fortunately, there are many emerging concepts such as constructs for tiling, data layout, array views, task and thread affinity, and topology aware communication libraries for managing data locality. There is an opportunity to identify commonalities in strategy to enable us to combine the best of these concepts to develop a comprehensive approach to expressing and managing data locality on exascale programming systems. These programming model abstractions can expose crucial information about data locality to the compiler and runtime system to enable performance-portable code. The research question is to identify the right level of abstraction, which includes techniques that range from template libraries all the way to completely new languages to achieve this goal
SciQL, A query language for science applications
Scientific applications are still poorly served by contemporary
relational database systems.
At best, the system provides a bridge towards an external library using
user-defined functions, explicit import/export facilities or linked-in
Java/C# interpreters.
Time has come to rectify this with SciQL, a SQL-query language for
science applications with arrays as first class citizens.
It provides a seamless symbiosis of array-, set-, and sequence-
interpretation using a clear separation of the mathematical object from
its underlying storage representation.
The language extends value-based grouping in SQL with structural
grouping, i.e., fixed-sized and unbounded groups based on explicit
relationships between its index attributes.
It leads to a generalization of window-based query processing.
The SciQL architecture benefits from a column store system with an
adaptive storage scheme, including keeping multiple representations
around for reduced impedance mismatch.
This paper is focused on the language features, its architectural
consequences and extensive examples of its intended use
Array optimizations for high productivity programming languages
While the HPCS languages (Chapel, Fortress and X10) have introduced improvements in programmer productivity, several challenges still remain in delivering high performance. In the absence of optimization, the high-level language constructs that improve productivity can result in order-of-magnitude runtime performance degradations.
This dissertation addresses the problem of efficient code generation for high-level array accesses in the X10 language. The X10 language supports rank-independent specification of loop and array computations using regions and points. Three aspects of high-level array accesses in X10 are important for productivity but also pose significant performance challenges: high-level accesses are performed through Point objects rather than integer indices, variables containing references to arrays are rank-independent, and array subscripts are verified as legal array indices during runtime program execution.
Our solution to the first challenge is to introduce new analyses and transformations that enable automatic inlining and scalar replacement of Point objects. Our solution to the second challenge is a hybrid approach. We use an interprocedural rank analysis algorithm to automatically infer ranks of arrays in X10. We use rank analysis information to enable storage transformations on arrays. If rank-independent array references still remain after compiler analysis, the programmer can use X10's dependent type system to safely annotate array variable declarations with additional information for the rank and region of the variable, and to enable the compiler to generate efficient code in cases where the dependent type information is available. Our solution to the third challenge is to use a new interprocedural array bounds analysis approach using regions to automatically determine when runtime bounds checks are not needed.
Our performance results show that our optimizations deliver performance that rivals the performance of hand-tuned code with explicit rank-specific loops and lower-level array accesses, and is up to two orders of magnitude faster than unoptimized, high-level X10 programs. These optimizations also result in scalability improvements of X10 programs as we increase the number of CPUs. While we perform the optimizations primarily in X10, these techniques are applicable to other high-productivity languages such as Chapel and Fortress
Compiler Techniques for Optimizing Communication and Data Distribution for Distributed-Memory Computers
Advanced Research Projects Agency (ARPA)National Aeronautics and Space AdministrationOpe
- …