Search CORE

24,348 research outputs found

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment

Author: Camille Coti
Camille Coti
Camille Coti
Emmanuel Agullo
Emmanuel Agullo
Emmanuel Agullo
Jack Dongarra
Jack Dongarra
Jack Dongarra
Julien Langou
Julien Langou
Qr Fac
Thomas Herault
Thomas Herault
Thomas Herault
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/12/2009
Field of study

Previous studies have reported that common dense linear algebra operations do not achieve speed up by using multiple geographical sites of a computational grid. Because such operations are the building blocks of most scientific applications, conventional supercomputers are still strongly predominant in high-performance computing and the use of grids for speeding up large-scale scientific problems is limited to applications exhibiting parallelism at a higher level. We have identified two performance bottlenecks in the distributed memory algorithms implemented in ScaLAPACK, a state-of-the-art dense linear algebra library. First, because ScaLAPACK assumes a homogeneous communication network, the implementations of ScaLAPACK algorithms lack locality in their communication pattern. Second, the number of messages sent in the ScaLAPACK algorithms is significantly greater than other algorithms that trade flops for communication. In this paper, we present a new approach for computing a QR factorization -- one of the main dense linear algebra kernels -- of tall and skinny matrices in a grid computing environment that overcomes these two bottlenecks. Our contribution is to articulate a recently proposed algorithm (Communication Avoiding QR) with a topology-aware middleware (QCG-OMPI) in order to confine intensive communications (ScaLAPACK calls) within the different geographical sites. An experimental study conducted on the Grid'5000 platform shows that the resulting performance increases linearly with the number of geographical sites on large-scale problems (and is in particular consistently higher than ScaLAPACK's).Comment: Accepted at IPDPS10. (IEEE International Parallel & Distributed Processing Symposium 2010 in Atlanta, GA, USA.

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Rennes 1

Minimizing Communication for Eigenproblems and the Singular Value Decomposition

Author: Ballard Grey
Demmel James
Dumitriu Ioana
Publication venue
Publication date: 01/01/2010
Field of study

Algorithms have two costs: arithmetic and communication. The latter represents the cost of moving data, either between levels of a memory hierarchy, or between processors over a network. Communication often dominates arithmetic and represents a rapidly increasing proportion of the total cost, so we seek algorithms that minimize communication. In \cite{BDHS10} lower bounds were presented on the amount of communication required for essentially all

O(n^3)

-like algorithms for linear algebra, including eigenvalue problems and the SVD. Conventional algorithms, including those currently implemented in (Sca)LAPACK, perform asymptotically more communication than these lower bounds require. In this paper we present parallel and sequential eigenvalue algorithms (for pencils, nonsymmetric matrices, and symmetric matrices) and SVD algorithms that do attain these lower bounds, and analyze their convergence and communication costs.Comment: 43 pages, 11 figure

arXiv.org e-Print Archive

CiteSeerX

Optical-inertia space sextant for an advanced space navigation system, phase B

Author: Auclair G. F.
Derby R. M.
Foley W. D.
Wilczynski J. J.
Publication venue
Publication date
Field of study

Optical-inertia space sextant for advanced space navigation syste

NASA Technical Reports Server

Optimizing local protocols implementing nonlocal quantum gates

Author: I. Schur
I. V. Schensted
L. K. Grover
M. Horodecki
M. Nielsen
P. W. Shor
P. W. Shor
S. K. Kim
Scott M. Cohen
Publication venue: 'American Physical Society (APS)'
Publication date: 02/02/2010
Field of study

We present a method of optimizing recently designed protocols for implementing an arbitrary nonlocal unitary gate acting on a bipartite system. These protocols use only local operations and classical communication with the assistance of entanglement, and are deterministic while also being "one-shot", in that they use only one copy of an entangled resource state. The optimization is in the sense of minimizing the amount of entanglement used, and it is often the case that less entanglement is needed than with an alternative protocol using two-way teleportation.Comment: 11 pages, 1 figure. This is a companion paper to arXiv:1001.546

arXiv.org e-Print Archive

Crossref

Implementing the SCAN language by neural networks

Author: Brause Rüdiger W.
Publication venue
Publication date: 08/09/2010
Field of study

The real-time encryption of pictures is an important subject for many applications, e.g. television broadcast stations, network security, etc. The paper shows how the previously introduced SCAN encryption method can be easily implemented using binary neural network autoassociative memory

Hochschulschriftenserver - Universität Frankfurt am Main