3,128 research outputs found
Big data scalability of bayesPhylogenies on Harvard’s ozone 12k cores
Computational Phylogenetics is classed as a grand challenge data driven problem in the fourth paradigm of
scientific discovery due to the exponential growth in genomic data, the computational challenge and the potential for vast impact on data driven biosciences. Petascale and Exascale computing offer the prospect of scaling
Phylogenetics to big data levels. However the computational complexity of even approximate Bayesian methods for phylogenetic inference requires scalable analysis for big data applications. There is limited study on
the scalability characteristics of existing computational models for petascale class massively parallel computers. In this paper we present strong and weak scaling performance analysis of BayesPhylogenies on Harvard’s
Ozone 12k cores. We perform evaluations on multiple data sizes to infer the scaling complexity and find that
strong scaling techniques along with novel methods for communication reduction are necessary if computational models are to overcome limitations on emerging complex parallel architectures with multiple levels of
concurrency. The results of this study can guide the design and implementation of scalable MCMC based
computational models for Bayesian inference on emerging petascale and exascale systems
Many-Task Computing and Blue Waters
This report discusses many-task computing (MTC) generically and in the
context of the proposed Blue Waters systems, which is planned to be the largest
NSF-funded supercomputer when it begins production use in 2012. The aim of this
report is to inform the BW project about MTC, including understanding aspects
of MTC applications that can be used to characterize the domain and
understanding the implications of these aspects to middleware and policies.
Many MTC applications do not neatly fit the stereotypes of high-performance
computing (HPC) or high-throughput computing (HTC) applications. Like HTC
applications, by definition MTC applications are structured as graphs of
discrete tasks, with explicit input and output dependencies forming the graph
edges. However, MTC applications have significant features that distinguish
them from typical HTC applications. In particular, different engineering
constraints for hardware and software must be met in order to support these
applications. HTC applications have traditionally run on platforms such as
grids and clusters, through either workflow systems or parallel programming
systems. MTC applications, in contrast, will often demand a short time to
solution, may be communication intensive or data intensive, and may comprise
very short tasks. Therefore, hardware and software for MTC must be engineered
to support the additional communication and I/O and must minimize task dispatch
overheads. The hardware of large-scale HPC systems, with its high degree of
parallelism and support for intensive communication, is well suited for MTC
applications. However, HPC systems often lack a dynamic resource-provisioning
feature, are not ideal for task communication via the file system, and have an
I/O system that is not optimized for MTC-style applications. Hence, additional
software support is likely to be required to gain full benefit from the HPC
hardware
The LSST Data Mining Research Agenda
We describe features of the LSST science database that are amenable to
scientific data mining, object classification, outlier identification, anomaly
detection, image quality assurance, and survey science validation. The data
mining research agenda includes: scalability (at petabytes scales) of existing
machine learning and data mining algorithms; development of grid-enabled
parallel data mining algorithms; designing a robust system for brokering
classifications from the LSST event pipeline (which may produce 10,000 or more
event alerts per night); multi-resolution methods for exploration of petascale
databases; indexing of multi-attribute multi-dimensional astronomical databases
(beyond spatial indexing) for rapid querying of petabyte databases; and more.Comment: 5 pages, Presented at the "Classification and Discovery in Large
Astronomical Surveys" meeting, Ringberg Castle, 14-17 October, 200
Petascale computations for Large-scale Atomic and Molecular collisions
Petaflop architectures are currently being utilized efficiently to perform
large scale computations in Atomic, Molecular and Optical Collisions. We solve
the Schroedinger or Dirac equation for the appropriate collision problem using
the R-matrix or R-matrix with pseudo-states approach. We briefly outline the
parallel methodology used and implemented for the current suite of Breit-Pauli
and DARC codes. Various examples are shown of our theoretical results compared
with those obtained from Synchrotron Radiation facilities and from Satellite
observations. We also indicate future directions and implementation of the
R-matrix codes on emerging GPU architectures.Comment: 14 pages, 5 figures, 3 tables, Chapter in: Workshop on Sustained
Simulated Performance 2013, Published by Springer, 2014, edited by Michael
Resch, Yevgeniya Kovalenko, Eric Focht, Wolfgang Bez and Hiroaki Kobaysah
A hybrid MPI-OpenMP scheme for scalable parallel pseudospectral computations for fluid turbulence
A hybrid scheme that utilizes MPI for distributed memory parallelism and
OpenMP for shared memory parallelism is presented. The work is motivated by the
desire to achieve exceptionally high Reynolds numbers in pseudospectral
computations of fluid turbulence on emerging petascale, high core-count,
massively parallel processing systems. The hybrid implementation derives from
and augments a well-tested scalable MPI-parallelized pseudospectral code. The
hybrid paradigm leads to a new picture for the domain decomposition of the
pseudospectral grids, which is helpful in understanding, among other things,
the 3D transpose of the global data that is necessary for the parallel fast
Fourier transforms that are the central component of the numerical
discretizations. Details of the hybrid implementation are provided, and
performance tests illustrate the utility of the method. It is shown that the
hybrid scheme achieves near ideal scalability up to ~20000 compute cores with a
maximum mean efficiency of 83%. Data are presented that demonstrate how to
choose the optimal number of MPI processes and OpenMP threads in order to
optimize code performance on two different platforms.Comment: Submitted to Parallel Computin
- …