10,292 research outputs found
A Scalable Scheme for Counting Linear Extensions
Peer reviewe
Algorithm-Directed Crash Consistence in Non-Volatile Memory for HPC
Fault tolerance is one of the major design goals for HPC. The emergence of
non-volatile memories (NVM) provides a solution to build fault tolerant HPC.
Data in NVM-based main memory are not lost when the system crashes because of
the non-volatility nature of NVM. However, because of volatile caches, data
must be logged and explicitly flushed from caches into NVM to ensure
consistence and correctness before crashes, which can cause large runtime
overhead.
In this paper, we introduce an algorithm-based method to establish crash
consistence in NVM for HPC applications. We slightly extend application data
structures or sparsely flush cache blocks, which introduce ignorable runtime
overhead. Such extension or cache flushing allows us to use algorithm knowledge
to \textit{reason} data consistence or correct inconsistent data when the
application crashes. We demonstrate the effectiveness of our method for three
algorithms, including an iterative solver, dense matrix multiplication, and
Monte-Carlo simulation. Based on comprehensive performance evaluation on a
variety of test environments, we demonstrate that our approach has very small
runtime overhead (at most 8.2\% and less than 3\% in most cases), much smaller
than that of traditional checkpoint, while having the same or less
recomputation cost.Comment: 12 page
The Parallelism Motifs of Genomic Data Analysis
Genomic data sets are growing dramatically as the cost of sequencing
continues to decline and small sequencing devices become available. Enormous
community databases store and share this data with the research community, but
some of these genomic data analysis problems require large scale computational
platforms to meet both the memory and computational requirements. These
applications differ from scientific simulations that dominate the workload on
high end parallel systems today and place different requirements on programming
support, software libraries, and parallel architectural design. For example,
they involve irregular communication patterns such as asynchronous updates to
shared data structures. We consider several problems in high performance
genomics analysis, including alignment, profiling, clustering, and assembly for
both single genomes and metagenomes. We identify some of the common
computational patterns or motifs that help inform parallelization strategies
and compare our motifs to some of the established lists, arguing that at least
two key patterns, sorting and hashing, are missing
- …