4,727 research outputs found
CRAFT: A library for easier application-level Checkpoint/Restart and Automatic Fault Tolerance
In order to efficiently use the future generations of supercomputers, fault
tolerance and power consumption are two of the prime challenges anticipated by
the High Performance Computing (HPC) community. Checkpoint/Restart (CR) has
been and still is the most widely used technique to deal with hard failures.
Application-level CR is the most effective CR technique in terms of overhead
efficiency but it takes a lot of implementation effort. This work presents the
implementation of our C++ based library CRAFT (Checkpoint-Restart and Automatic
Fault Tolerance), which serves two purposes. First, it provides an extendable
library that significantly eases the implementation of application-level
checkpointing. The most basic and frequently used checkpoint data types are
already part of CRAFT and can be directly used out of the box. The library can
be easily extended to add more data types. As means of overhead reduction, the
library offers a build-in asynchronous checkpointing mechanism and also
supports the Scalable Checkpoint/Restart (SCR) library for node level
checkpointing. Second, CRAFT provides an easier interface for User-Level
Failure Mitigation (ULFM) based dynamic process recovery, which significantly
reduces the complexity and effort of failure detection and communication
recovery mechanism. By utilizing both functionalities together, applications
can write application-level checkpoints and recover dynamically from process
failures with very limited programming effort. This work presents the design
and use of our library in detail. The associated overheads are thoroughly
analyzed using several benchmarks
Solving Mixed--integer Control Problems by Sum Up Rounding With Guaranteed Integer Gap
Probleme der Optimalen Steuerung, die zeitabhaengige diskrete Entscheidungen beinhalten, haben in letzter Zeit zunehmend Beachtung gefunden, da sie in praktischen Anwendungen mit hohem Potential fuer Optimierung auftreten. Typische Beispiele sind die Wahl von Gaengen in Transport-Problemen oder Prozesse, in denen Ventile verwendet werden. Wir praesentieren Rundungsstrategien fuer direkte Methoden der optimalen Steuerung, die zu einer Approximation der Zielfunktion und Nebenbedingungen fuehren, deren Guete durch die Feinheit des Kontrolldiskretisierungsgitters abgeschaetzt werden kann. Erstmals wird gezeigt, dass eine endliche Anzahl von Umschaltungen sowohl im linearen wie im nichtlinearen Fall ausreicht, und dies bei Existenz von Pfad- und Kontrollbeschraenkungen. Ein numerisches Beispiel wird angegeben um die Methodik zu illustrieren
Sparse matrix-vector multiplication on GPGPU clusters: A new storage format and a scalable implementation
Sparse matrix-vector multiplication (spMVM) is the dominant operation in many
sparse solvers. We investigate performance properties of spMVM with matrices of
various sparsity patterns on the nVidia "Fermi" class of GPGPUs. A new "padded
jagged diagonals storage" (pJDS) format is proposed which may substantially
reduce the memory overhead intrinsic to the widespread ELLPACK-R scheme. In our
test scenarios the pJDS format cuts the overall spMVM memory footprint on the
GPGPU by up to 70%, and achieves 95% to 130% of the ELLPACK-R performance.
Using a suitable performance model we identify performance bottlenecks on the
node level that invalidate some types of matrix structures for efficient
multi-GPGPU parallelization. For appropriate sparsity patterns we extend
previous work on distributed-memory parallel spMVM to demonstrate a scalable
hybrid MPI-GPGPU code, achieving efficient overlap of communication and
computation.Comment: 10 pages, 5 figures. Added reference to other recent sparse matrix
format
The Network Origins of Economic Growth
In this paper, we propose a new approach to represent a country's outward orientation.
Prior work mostly uses indicators of aggregate trade intensity, trade policy
or trade restrictiveness. Our approach offers a broader perspective as it measures
a country's level of integration not only by its set of direct trade connections with
the rest of the world but also through the full architecture of its second, third, and
all other higher-order connections. We apply our methodology to a sample of 167
countries spanning the period from 1962 to 2009 and perform a Bayesian modelaveraging
analysis on the determinants of growth. We find a prominent positive effect of integration on a country's level of per capita income, while the aforementioned
traditional measures of outward orientation display only a secondary, largely
insignificant, weight. This, we argue, highlights the network basis of economic growth
and adds a novel perspective to the notion of economic openness. We also perform
several sensitivity checks and conclude that our baseline findings are extremely robust
to different data input and alternative assumptions about the computation of
country integration
GHOST: Building blocks for high performance sparse linear algebra on heterogeneous systems
While many of the architectural details of future exascale-class high
performance computer systems are still a matter of intense research, there
appears to be a general consensus that they will be strongly heterogeneous,
featuring "standard" as well as "accelerated" resources. Today, such resources
are available as multicore processors, graphics processing units (GPUs), and
other accelerators such as the Intel Xeon Phi. Any software infrastructure that
claims usefulness for such environments must be able to meet their inherent
challenges: massive multi-level parallelism, topology, asynchronicity, and
abstraction. The "General, Hybrid, and Optimized Sparse Toolkit" (GHOST) is a
collection of building blocks that targets algorithms dealing with sparse
matrix representations on current and future large-scale systems. It implements
the "MPI+X" paradigm, has a pure C interface, and provides hybrid-parallel
numerical kernels, intelligent resource management, and truly heterogeneous
parallelism for multicore CPUs, Nvidia GPUs, and the Intel Xeon Phi. We
describe the details of its design with respect to the challenges posed by
modern heterogeneous supercomputers and recent algorithmic developments.
Implementation details which are indispensable for achieving high efficiency
are pointed out and their necessity is justified by performance measurements or
predictions based on performance models. The library code and several
applications are available as open source. We also provide instructions on how
to make use of GHOST in existing software packages, together with a case study
which demonstrates the applicability and performance of GHOST as a component
within a larger software stack.Comment: 32 pages, 11 figure
- …
