78 research outputs found
Machine learning for ultrafast X-ray diffraction patterns on large-scale GPU clusters
The classical method of determining the atomic structure of complex molecules
by analyzing diffraction patterns is currently undergoing drastic developments.
Modern techniques for producing extremely bright and coherent X-ray lasers
allow a beam of streaming particles to be intercepted and hit by an ultrashort
high energy X-ray beam. Through machine learning methods the data thus
collected can be transformed into a three-dimensional volumetric intensity map
of the particle itself. The computational complexity associated with this
problem is very high such that clusters of data parallel accelerators are
required.
We have implemented a distributed and highly efficient algorithm for
inversion of large collections of diffraction patterns targeting clusters of
hundreds of GPUs. With the expected enormous amount of diffraction data to be
produced in the foreseeable future, this is the required scale to approach real
time processing of data at the beam site. Using both real and synthetic data we
look at the scaling properties of the application and discuss the overall
computational viability of this exciting and novel imaging technique
Inner product computation for sparse iterative solvers on\ud distributed supercomputer
Recent years have witnessed that iterative Krylov methods without re-designing are not suitable for distribute supercomputers because of intensive global communications. It is well accepted that re-engineering Krylov methods for prescribed computer architecture is necessary and important to achieve higher performance and scalability. The paper focuses on simple and practical ways to re-organize Krylov methods and improve their performance for current heterogeneous distributed supercomputers. In construct with most of current software development of Krylov methods which usually focuses on efficient matrix vector multiplications, the paper focuses on the way to compute inner products on supercomputers and explains why inner product computation on current heterogeneous distributed supercomputers is crucial for scalable Krylov methods. Communication complexity analysis shows that how the inner product computation can be the bottleneck of performance of (inner) product-type iterative solvers on distributed supercomputers due to global communications. Principles of reducing such global communications are discussed. The importance of minimizing communications is demonstrated by experiments using up to 900 processors. The experiments were carried on a Dawning 5000A, one of the fastest and earliest heterogeneous supercomputers in the world. Both the analysis and experiments indicates that inner product computation is very likely to be the most challenging kernel for inner product-based iterative solvers to achieve exascale
A parallel edge orientation algorithm for quadrilateral meshes
One approach to achieving correct finite element assembly is to ensure that
the local orientation of facets relative to each cell in the mesh is consistent
with the global orientation of that facet. Rognes et al. have shown how to
achieve this for any mesh composed of simplex elements, and deal.II contains a
serial algorithm to construct a consistent orientation of any quadrilateral
mesh of an orientable manifold.
The core contribution of this paper is the extension of this algorithm for
distributed memory parallel computers, which facilitates its seamless
application as part of a parallel simulation system.
Furthermore, our analysis establishes a link between the well-known
Union-Find algorithm and the construction of a consistent orientation of a
quadrilateral mesh. As a result, existing work on the parallelisation of the
Union-Find algorithm can be easily adapted to construct further parallel
algorithms for mesh orientations.Comment: Second revision: minor change
The Statistical Performance of Collaborative Inference
The statistical analysis of massive and complex data sets will require the
development of algorithms that depend on distributed computing and
collaborative inference. Inspired by this, we propose a collaborative framework
that aims to estimate the unknown mean of a random variable . In
the model we present, a certain number of calculation units, distributed across
a communication network represented by a graph, participate in the estimation
of by sequentially receiving independent data from while
exchanging messages via a stochastic matrix defined over the graph. We give
precise conditions on the matrix under which the statistical precision of
the individual units is comparable to that of a (gold standard) virtual
centralized estimate, even though each unit does not have access to all of the
data. We show in particular the fundamental role played by both the non-trivial
eigenvalues of and the Ramanujan class of expander graphs, which provide
remarkable performance for moderate algorithmic cost
- …
