5 research outputs found
The Parallelism Motifs of Genomic Data Analysis
Genomic data sets are growing dramatically as the cost of sequencing
continues to decline and small sequencing devices become available. Enormous
community databases store and share this data with the research community, but
some of these genomic data analysis problems require large scale computational
platforms to meet both the memory and computational requirements. These
applications differ from scientific simulations that dominate the workload on
high end parallel systems today and place different requirements on programming
support, software libraries, and parallel architectural design. For example,
they involve irregular communication patterns such as asynchronous updates to
shared data structures. We consider several problems in high performance
genomics analysis, including alignment, profiling, clustering, and assembly for
both single genomes and metagenomes. We identify some of the common
computational patterns or motifs that help inform parallelization strategies
and compare our motifs to some of the established lists, arguing that at least
two key patterns, sorting and hashing, are missing
10 Years Later: Cloud Computing is Closing the Performance Gap
Can cloud computing infrastructures provide HPC-competitive performance for
scientific applications broadly? Despite prolific related literature, this
question remains open. Answers are crucial for designing future systems and
democratizing high-performance computing. We present a multi-level approach to
investigate the performance gap between HPC and cloud computing, isolating
different variables that contribute to this gap. Our experiments are divided
into (i) hardware and system microbenchmarks and (ii) user application proxies.
The results show that today's high-end cloud computing can deliver
HPC-competitive performance not only for computationally intensive applications
but also for memory- and communication-intensive applications - at least at
modest scales - thanks to the high-speed memory systems and interconnects and
dedicated batch scheduling now available on some cloud platforms
Recommended from our members
The parallelism motifs of genomic data analysis.
Genomic datasets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share these data with the research community, but some of these genomic data analysis problems require large-scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high-end parallel systems today and place different requirements on programming support, software libraries and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high-performance genomics analysis, including alignment, profiling, clustering and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or 'motifs' that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'