Search CORE

78 research outputs found

Machine learning for ultrafast X-ray diffraction patterns on large-scale GPU clusters

Author: Ekeberg Tomas
Engblom Stefan
Liu Jing
Publication venue: 'SAGE Publications'
Publication date: 16/12/2014
Field of study

The classical method of determining the atomic structure of complex molecules by analyzing diffraction patterns is currently undergoing drastic developments. Modern techniques for producing extremely bright and coherent X-ray lasers allow a beam of streaming particles to be intercepted and hit by an ultrashort high energy X-ray beam. Through machine learning methods the data thus collected can be transformed into a three-dimensional volumetric intensity map of the particle itself. The computational complexity associated with this problem is very high such that clusters of data parallel accelerators are required. We have implemented a distributed and highly efficient algorithm for inversion of large collections of diffraction patterns targeting clusters of hundreds of GPUs. With the expected enormous amount of diffraction data to be produced in the foreseeable future, this is the required scale to approach real time processing of data at the beam site. Using both real and synthetic data we look at the scaling properties of the application and discuss the overall computational viability of this exciting and novel imaging technique

arXiv.org e-Print Archive

CiteSeerX

Crossref

Publikationer från Uppsala Universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swepub

Inner product computation for sparse iterative solvers on\ud distributed supercomputer

Author: Gu T. -X.
Liu X. -P.
Zhu S. -X.
Publication venue
Publication date: 01/01/2012
Field of study

Recent years have witnessed that iterative Krylov methods without re-designing are not suitable for distribute supercomputers because of intensive global communications. It is well accepted that re-engineering Krylov methods for prescribed computer architecture is necessary and important to achieve higher performance and scalability. The paper focuses on simple and practical ways to re-organize Krylov methods and improve their performance for current heterogeneous distributed supercomputers. In construct with most of current software development of Krylov methods which usually focuses on efficient matrix vector multiplications, the paper focuses on the way to compute inner products on supercomputers and explains why inner product computation on current heterogeneous distributed supercomputers is crucial for scalable Krylov methods. Communication complexity analysis shows that how the inner product computation can be the bottleneck of performance of (inner) product-type iterative solvers on distributed supercomputers due to global communications. Principles of reducing such global communications are discussed. The importance of minimizing communications is demonstrated by experiments using up to 900 processors. The experiments were carried on a Dawning 5000A, one of the fastest and earliest heterogeneous supercomputers in the world. Both the analysis and experiments indicates that inner product computation is very likely to be the most challenging kernel for inner product-based iterative solvers to achieve exascale

Oxford University Research Archive

A parallel edge orientation algorithm for quadrilateral meshes

Author: Ham David A.
Homolya Miklós
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 13/05/2015
Field of study

One approach to achieving correct finite element assembly is to ensure that the local orientation of facets relative to each cell in the mesh is consistent with the global orientation of that facet. Rognes et al. have shown how to achieve this for any mesh composed of simplex elements, and deal.II contains a serial algorithm to construct a consistent orientation of any quadrilateral mesh of an orientable manifold. The core contribution of this paper is the extension of this algorithm for distributed memory parallel computers, which facilitates its seamless application as part of a parallel simulation system. Furthermore, our analysis establishes a link between the well-known Union-Find algorithm and the construction of a consistent orientation of a quadrilateral mesh. As a result, existing work on the parallelisation of the Union-Find algorithm can be easily adapted to construct further parallel algorithms for mesh orientations.Comment: Second revision: minor change

arXiv.org e-Print Archive

Crossref

Spiral - Imperial College Digital Repository

The Statistical Performance of Collaborative Inference

Author: Biau Gérard
Bleakley Kevin
Cadre Benoit
Publication venue
Publication date: 01/07/2015
Field of study

The statistical analysis of massive and complex data sets will require the development of algorithms that depend on distributed computing and collaborative inference. Inspired by this, we propose a collaborative framework that aims to estimate the unknown mean

\theta

of a random variable

X

. In the model we present, a certain number of calculation units, distributed across a communication network represented by a graph, participate in the estimation of

\theta

by sequentially receiving independent data from

X

while exchanging messages via a stochastic matrix

A

defined over the graph. We give precise conditions on the matrix

A

under which the statistical precision of the individual units is comparable to that of a (gold standard) virtual centralized estimate, even though each unit does not have access to all of the data. We show in particular the fundamental role played by both the non-trivial eigenvalues of

A

and the Ramanujan class of expander graphs, which provide remarkable performance for moderate algorithmic cost

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Portail HAL UNIV-RENNES