6,014 research outputs found
Learning from the Success of MPI
The Message Passing Interface (MPI) has been extremely successful as a
portable way to program high-performance parallel computers. This success has
occurred in spite of the view of many that message passing is difficult and
that other approaches, including automatic parallelization and directive-based
parallelism, are easier to use. This paper argues that MPI has succeeded
because it addresses all of the important issues in providing a parallel
programming model.Comment: 12 pages, 1 figur
GPU-based Streaming for Parallel Level of Detail on Massive Model Rendering
Rendering massive 3D models in real-time has long been recognized as a very challenging problem because of the limited computational power and memory space available in a workstation. Most existing rendering techniques, especially level of detail (LOD) processing, have suffered from their sequential execution natures, and does not scale well with the size of the models. We present a GPU-based progressive mesh simplification approach which enables the interactive rendering of large 3D models with hundreds of millions of triangles. Our work contributes to the massive rendering research in two ways. First, we develop a novel data structure to represent the progressive LOD mesh, and design a parallel mesh simplification algorithm towards GPU architecture. Second, we propose a GPU-based streaming approach which adopt a frame-to-frame coherence scheme in order to minimize the high communication cost between CPU and GPU. Our results show that the parallel mesh simplification algorithm and GPU-based streaming approach significantly improve the overall rendering performance
Recent Advances in Graph Partitioning
We survey recent trends in practical algorithms for balanced graph
partitioning together with applications and future research directions
Link prediction in very large directed graphs: Exploiting hierarchical properties in parallel
Link prediction is a link mining task that tries to find new edges within a given graph. Among the targets of link prediction there is large directed graphs, which are frequent structures nowadays. The typical sparsity of large graphs demands of high precision predictions in order to obtain usable results. However, the size of those graphs only permits the execution of scalable algorithms. As a trade-off between those two problems we recently proposed a link prediction algorithm for directed graphs that exploits hierarchical properties. The algorithm can be classified as a local score, which entails scalability. Unlike the rest of local scores, our proposal assumes the existence of an underlying model for the data which allows it to produce predictions with a higher precision. We test the validity of its hierarchical assumptions on two clearly hierarchical data sets, one of them based on RDF. Then we test it on a non-hierarchical data set based on Wikipedia to demonstrate its broad applicability. Given the computational complexity of link prediction in very large graphs we also introduce some general recommendations useful to make of link prediction an efficiently parallelized problem.Peer ReviewedPostprint (published version
Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes
The ongoing hardware evolution exhibits an escalation in the number, as well
as in the heterogeneity, of computing resources. The pressure to maintain
reasonable levels of performance and portability forces application developers
to leave the traditional programming paradigms and explore alternative
solutions. PaStiX is a parallel sparse direct solver, based on a dynamic
scheduler for modern hierarchical manycore architectures. In this paper, we
study the benefits and limits of replacing the highly specialized internal
scheduler of the PaStiX solver with two generic runtime systems: PaRSEC and
StarPU. The tasks graph of the factorization step is made available to the two
runtimes, providing them the opportunity to process and optimize its traversal
in order to maximize the algorithm efficiency for the targeted hardware
platform. A comparative study of the performance of the PaStiX solver on top of
its native internal scheduler, PaRSEC, and StarPU frameworks, on different
execution environments, is performed. The analysis highlights that these
generic task-based runtimes achieve comparable results to the
application-optimized embedded scheduler on homogeneous platforms. Furthermore,
they are able to significantly speed up the solver on heterogeneous
environments by taking advantage of the accelerators while hiding the
complexity of their efficient manipulation from the programmer.Comment: Heterogeneity in Computing Workshop (2014
Havens: Explicit Reliable Memory Regions for HPC Applications
Supporting error resilience in future exascale-class supercomputing systems
is a critical challenge. Due to transistor scaling trends and increasing memory
density, scientific simulations are expected to experience more interruptions
caused by transient errors in the system memory. Existing hardware-based
detection and recovery techniques will be inadequate to manage the presence of
high memory fault rates.
In this paper we propose a partial memory protection scheme based on
region-based memory management. We define the concept of regions called havens
that provide fault protection for program objects. We provide reliability for
the regions through a software-based parity protection mechanism. Our approach
enables critical program objects to be placed in these havens. The fault
coverage provided by our approach is application agnostic, unlike
algorithm-based fault tolerance techniques.Comment: 2016 IEEE High Performance Extreme Computing Conference (HPEC '16),
September 2016, Waltham, MA, US
- …