2,054 research outputs found
A Language and Hardware Independent Approach to Quantum-Classical Computing
Heterogeneous high-performance computing (HPC) systems offer novel
architectures which accelerate specific workloads through judicious use of
specialized coprocessors. A promising architectural approach for future
scientific computations is provided by heterogeneous HPC systems integrating
quantum processing units (QPUs). To this end, we present XACC (eXtreme-scale
ACCelerator) --- a programming model and software framework that enables
quantum acceleration within standard or HPC software workflows. XACC follows a
coprocessor machine model that is independent of the underlying quantum
computing hardware, thereby enabling quantum programs to be defined and
executed on a variety of QPUs types through a unified application programming
interface. Moreover, XACC defines a polymorphic low-level intermediate
representation, and an extensible compiler frontend that enables language
independent quantum programming, thus promoting integration and
interoperability across the quantum programming landscape. In this work we
define the software architecture enabling our hardware and language independent
approach, and demonstrate its usefulness across a range of quantum computing
models through illustrative examples involving the compilation and execution of
gate and annealing-based quantum programs
Many-Task Computing and Blue Waters
This report discusses many-task computing (MTC) generically and in the
context of the proposed Blue Waters systems, which is planned to be the largest
NSF-funded supercomputer when it begins production use in 2012. The aim of this
report is to inform the BW project about MTC, including understanding aspects
of MTC applications that can be used to characterize the domain and
understanding the implications of these aspects to middleware and policies.
Many MTC applications do not neatly fit the stereotypes of high-performance
computing (HPC) or high-throughput computing (HTC) applications. Like HTC
applications, by definition MTC applications are structured as graphs of
discrete tasks, with explicit input and output dependencies forming the graph
edges. However, MTC applications have significant features that distinguish
them from typical HTC applications. In particular, different engineering
constraints for hardware and software must be met in order to support these
applications. HTC applications have traditionally run on platforms such as
grids and clusters, through either workflow systems or parallel programming
systems. MTC applications, in contrast, will often demand a short time to
solution, may be communication intensive or data intensive, and may comprise
very short tasks. Therefore, hardware and software for MTC must be engineered
to support the additional communication and I/O and must minimize task dispatch
overheads. The hardware of large-scale HPC systems, with its high degree of
parallelism and support for intensive communication, is well suited for MTC
applications. However, HPC systems often lack a dynamic resource-provisioning
feature, are not ideal for task communication via the file system, and have an
I/O system that is not optimized for MTC-style applications. Hence, additional
software support is likely to be required to gain full benefit from the HPC
hardware
Evaluation of DVFS techniques on modern HPC processors and accelerators for energy-aware applications
Energy efficiency is becoming increasingly important for computing systems,
in particular for large scale HPC facilities. In this work we evaluate, from an
user perspective, the use of Dynamic Voltage and Frequency Scaling (DVFS)
techniques, assisted by the power and energy monitoring capabilities of modern
processors in order to tune applications for energy efficiency. We run selected
kernels and a full HPC application on two high-end processors widely used in
the HPC context, namely an NVIDIA K80 GPU and an Intel Haswell CPU. We evaluate
the available trade-offs between energy-to-solution and time-to-solution,
attempting a function-by-function frequency tuning. We finally estimate the
benefits obtainable running the full code on a HPC multi-GPU node, with respect
to default clock frequency governors. We instrument our code to accurately
monitor power consumption and execution time without the need of any additional
hardware, and we enable it to change CPUs and GPUs clock frequencies while
running. We analyze our results on the different architectures using a simple
energy-performance model, and derive a number of energy saving strategies which
can be easily adopted on recent high-end HPC systems for generic applications
TANGO: Transparent heterogeneous hardware Architecture deployment for eNergy Gain in Operation
The paper is concerned with the issue of how software systems actually use
Heterogeneous Parallel Architectures (HPAs), with the goal of optimizing power
consumption on these resources. It argues the need for novel methods and tools
to support software developers aiming to optimise power consumption resulting
from designing, developing, deploying and running software on HPAs, while
maintaining other quality aspects of software to adequate and agreed levels. To
do so, a reference architecture to support energy efficiency at application
construction, deployment, and operation is discussed, as well as its
implementation and evaluation plans.Comment: Part of the Program Transformation for Programmability in
Heterogeneous Architectures (PROHA) workshop, Barcelona, Spain, 12th March
2016, 7 pages, LaTeX, 3 PNG figure
Research and Education in Computational Science and Engineering
Over the past two decades the field of computational science and engineering
(CSE) has penetrated both basic and applied research in academia, industry, and
laboratories to advance discovery, optimize systems, support decision-makers,
and educate the scientific and engineering workforce. Informed by centuries of
theory and experiment, CSE performs computational experiments to answer
questions that neither theory nor experiment alone is equipped to answer. CSE
provides scientists and engineers of all persuasions with algorithmic
inventions and software systems that transcend disciplines and scales. Carried
on a wave of digital technology, CSE brings the power of parallelism to bear on
troves of data. Mathematics-based advanced computing has become a prevalent
means of discovery and innovation in essentially all areas of science,
engineering, technology, and society; and the CSE community is at the core of
this transformation. However, a combination of disruptive
developments---including the architectural complexity of extreme-scale
computing, the data revolution that engulfs the planet, and the specialization
required to follow the applications to new frontiers---is redefining the scope
and reach of the CSE endeavor. This report describes the rapid expansion of CSE
and the challenges to sustaining its bold advances. The report also presents
strategies and directions for CSE research and education for the next decade.Comment: Major revision, to appear in SIAM Revie
A Low Level Component Model enabling Resource Specialization of HPC Applications
Scientific applications are still getting more complex, e.g. to improve their accuracy by taking into account more phenomena. Moreover, computing infrastructures are continuing their fast evolution. Therefore, software engineering is becoming a major issue to achieve easiness of development, portability, simple maintenance, while achieving high performance. Software component model is a promising approach, which enables to manipulate the software architecture of an application. However, existing models do not capture enough resource specificities. This paper proposes a low level component model (L2C) that supports directly native connectors such as MPI, shared memory and method invocation. L2C is intended to be used as a back end by a ''compiler'' (such as HLCM) to generate an application assembly specific to a given machine. This paper shows on a typical domain decomposition use case that \llc can achieve the same performance as native implementations, while gaining benefits such as enabling resource specialization capabilities.Les applications scientifiques continuent de devenir de plus en plus complexes, par exemple pour améliorer leur précision en intégrant davantage de phénomènes à simuler. Par ailleurs, les infrastructures de calcul continuent leur rapide évolution. Ainsi, l'ingénierie logicielle devient un défi très important afin de permettre une facilité de développement, la portabilité des codes, et une maintenance acceptable tout en permettant de hautes performances. Les modèles de composants logiciels offrent une approche prometteuse en permettant de manipuler l'architecture logicielle d'une application. Cependant, les modèles existant ne permettent pas de capturer suffisamment les spécificités des ressources de calcul. Cet article propose un modèle de composant logiciel ''bas niveau'' (L2C) qui permet l'intègration native de connecteurs tels que MPI, la mémoire partagée ou l'invocation de méthode. L2C est destiné à être utilisé en tant que langage de sortie d'un ''compilateur'' (tel que HLCM) générant un assemblage d'une application spécifique à une machine et à une exécution. Cet article montre sur un cas d'étude typique de décomposition de domaines que L2C permet d'atteindre les même performances que les applications natives, tout en offrant des possibilités d'optimisation par rapport aux capacités des ressources
Recommended from our members
Preparing sparse solvers for exascale computing.
Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
- …