1,505 research outputs found
Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes
The ongoing hardware evolution exhibits an escalation in the number, as well
as in the heterogeneity, of computing resources. The pressure to maintain
reasonable levels of performance and portability forces application developers
to leave the traditional programming paradigms and explore alternative
solutions. PaStiX is a parallel sparse direct solver, based on a dynamic
scheduler for modern hierarchical manycore architectures. In this paper, we
study the benefits and limits of replacing the highly specialized internal
scheduler of the PaStiX solver with two generic runtime systems: PaRSEC and
StarPU. The tasks graph of the factorization step is made available to the two
runtimes, providing them the opportunity to process and optimize its traversal
in order to maximize the algorithm efficiency for the targeted hardware
platform. A comparative study of the performance of the PaStiX solver on top of
its native internal scheduler, PaRSEC, and StarPU frameworks, on different
execution environments, is performed. The analysis highlights that these
generic task-based runtimes achieve comparable results to the
application-optimized embedded scheduler on homogeneous platforms. Furthermore,
they are able to significantly speed up the solver on heterogeneous
environments by taking advantage of the accelerators while hiding the
complexity of their efficient manipulation from the programmer.Comment: Heterogeneity in Computing Workshop (2014
Report from the MPP Working Group to the NASA Associate Administrator for Space Science and Applications
NASA's Office of Space Science and Applications (OSSA) gave a select group of scientists the opportunity to test and implement their computational algorithms on the Massively Parallel Processor (MPP) located at Goddard Space Flight Center, beginning in late 1985. One year later, the Working Group presented its report, which addressed the following: algorithms, programming languages, architecture, programming environments, the way theory relates, and performance measured. The findings point to a number of demonstrated computational techniques for which the MPP architecture is ideally suited. For example, besides executing much faster on the MPP than on conventional computers, systolic VLSI simulation (where distances are short), lattice simulation, neural network simulation, and image problems were found to be easier to program on the MPP's architecture than on a CYBER 205 or even a VAX. The report also makes technical recommendations covering all aspects of MPP use, and recommendations concerning the future of the MPP and machines based on similar architectures, expansion of the Working Group, and study of the role of future parallel processors for space station, EOS, and the Great Observatories era
Accelerating the Fourier split operator method via graphics processing units
Current generations of graphics processing units have turned into highly
parallel devices with general computing capabilities. Thus, graphics processing
units may be utilized, for example, to solve time dependent partial
differential equations by the Fourier split operator method. In this
contribution, we demonstrate that graphics processing units are capable to
calculate fast Fourier transforms much more efficiently than traditional
central processing units. Thus, graphics processing units render efficient
implementations of the Fourier split operator method possible. Performance
gains of more than an order of magnitude as compared to implementations for
traditional central processing units are reached in the solution of the time
dependent Schr\"odinger equation and the time dependent Dirac equation
ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers
Solving the electronic structure from a generalized or standard eigenproblem
is often the bottleneck in large scale calculations based on Kohn-Sham
density-functional theory. This problem must be addressed by essentially all
current electronic structure codes, based on similar matrix expressions, and by
high-performance computation. We here present a unified software interface,
ELSI, to access different strategies that address the Kohn-Sham eigenvalue
problem. Currently supported algorithms include the dense generalized
eigensolver library ELPA, the orbital minimization method implemented in
libOMM, and the pole expansion and selected inversion (PEXSI) approach with
lower computational complexity for semilocal density functionals. The ELSI
interface aims to simplify the implementation and optimal use of the different
strategies, by offering (a) a unified software framework designed for the
electronic structure solvers in Kohn-Sham density-functional theory; (b)
reasonable default parameters for a chosen solver; (c) automatic conversion
between input and internal working matrix formats, and in the future (d)
recommendation of the optimal solver depending on the specific problem.
Comparative benchmarks are shown for system sizes up to 11,520 atoms (172,800
basis functions) on distributed memory supercomputing architectures.Comment: 55 pages, 14 figures, 2 table
Exploiting a Parametrized Task Graph model for the parallelization of a sparse direct multifrontal solver
International audienceThe advent of multicore processors requires to reconsider the design of high performance computing libraries to embrace portable and effective techniques of parallel software engineering. One of the most promising approaches consists in abstracting an application as a directed acyclic graph (DAG) of tasks. While this approach has been popularized for shared memory environments by the OpenMP 4.0 standard where dependencies between tasks are automatically inferred, we investigate an alternative approach, capable of describing the DAG of task in a distributed setting, where task dependencies are explicitly encoded. So far this approach has been mostly used in the case of algorithms with a regular data access pattern and we show in this study that it can be efficiently applied to a higly irregular numerical algorithm such as a sparse multifrontal QR method. We present the resulting implementation and discuss the potential and limits of this approach in terms of productivity and effectiveness in comparison with more common parallelization techniques. Although at an early stage of development, preliminary results show the potential of the parallel programming model that we investigate in this work
Overcoming rule-based rigidity and connectionist limitations through massively-parallel case-based reasoning
Symbol manipulation as used in traditional Artificial Intelligence has been criticized by neural net researchers for being excessively inflexible and sequential. On the other hand, the application of neural net techniques to the types of high-level cognitive processing studied in traditional artificial intelligence presents major problems as well. A promising way out of this impasse is to build neural net models that accomplish massively parallel case-based reasoning. Case-based reasoning, which has received much attention recently, is essentially the same as analogy-based reasoning, and avoids many of the problems leveled at traditional artificial intelligence. Further problems are avoided by doing many strands of case-based reasoning in parallel, and by implementing the whole system as a neural net. In addition, such a system provides an approach to some aspects of the problems of noise, uncertainty and novelty in reasoning systems. The current neural net system (Conposit), which performs standard rule-based reasoning, is being modified into a massively parallel case-based reasoning version
2HOT: An Improved Parallel Hashed Oct-Tree N-Body Algorithm for Cosmological Simulation
We report on improvements made over the past two decades to our adaptive
treecode N-body method (HOT). A mathematical and computational approach to the
cosmological N-body problem is described, with performance and scalability
measured up to 256k () processors. We present error analysis and
scientific application results from a series of more than ten 69 billion
() particle cosmological simulations, accounting for
floating point operations. These results include the first simulations using
the new constraints on the standard model of cosmology from the Planck
satellite. Our simulations set a new standard for accuracy and scientific
throughput, while meeting or exceeding the computational efficiency of the
latest generation of hybrid TreePM N-body methods.Comment: 12 pages, 8 figures, 77 references; To appear in Proceedings of SC
'1
- …