24,344 research outputs found
QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment
Previous studies have reported that common dense linear algebra operations do
not achieve speed up by using multiple geographical sites of a computational
grid. Because such operations are the building blocks of most scientific
applications, conventional supercomputers are still strongly predominant in
high-performance computing and the use of grids for speeding up large-scale
scientific problems is limited to applications exhibiting parallelism at a
higher level. We have identified two performance bottlenecks in the distributed
memory algorithms implemented in ScaLAPACK, a state-of-the-art dense linear
algebra library. First, because ScaLAPACK assumes a homogeneous communication
network, the implementations of ScaLAPACK algorithms lack locality in their
communication pattern. Second, the number of messages sent in the ScaLAPACK
algorithms is significantly greater than other algorithms that trade flops for
communication. In this paper, we present a new approach for computing a QR
factorization -- one of the main dense linear algebra kernels -- of tall and
skinny matrices in a grid computing environment that overcomes these two
bottlenecks. Our contribution is to articulate a recently proposed algorithm
(Communication Avoiding QR) with a topology-aware middleware (QCG-OMPI) in
order to confine intensive communications (ScaLAPACK calls) within the
different geographical sites. An experimental study conducted on the Grid'5000
platform shows that the resulting performance increases linearly with the
number of geographical sites on large-scale problems (and is in particular
consistently higher than ScaLAPACK's).Comment: Accepted at IPDPS10. (IEEE International Parallel & Distributed
Processing Symposium 2010 in Atlanta, GA, USA.
Minimizing Communication for Eigenproblems and the Singular Value Decomposition
Algorithms have two costs: arithmetic and communication. The latter
represents the cost of moving data, either between levels of a memory
hierarchy, or between processors over a network. Communication often dominates
arithmetic and represents a rapidly increasing proportion of the total cost, so
we seek algorithms that minimize communication. In \cite{BDHS10} lower bounds
were presented on the amount of communication required for essentially all
-like algorithms for linear algebra, including eigenvalue problems and
the SVD. Conventional algorithms, including those currently implemented in
(Sca)LAPACK, perform asymptotically more communication than these lower bounds
require. In this paper we present parallel and sequential eigenvalue algorithms
(for pencils, nonsymmetric matrices, and symmetric matrices) and SVD algorithms
that do attain these lower bounds, and analyze their convergence and
communication costs.Comment: 43 pages, 11 figure
Optical-inertia space sextant for an advanced space navigation system, phase B
Optical-inertia space sextant for advanced space navigation syste
Optimizing local protocols implementing nonlocal quantum gates
We present a method of optimizing recently designed protocols for
implementing an arbitrary nonlocal unitary gate acting on a bipartite system.
These protocols use only local operations and classical communication with the
assistance of entanglement, and are deterministic while also being "one-shot",
in that they use only one copy of an entangled resource state. The optimization
is in the sense of minimizing the amount of entanglement used, and it is often
the case that less entanglement is needed than with an alternative protocol
using two-way teleportation.Comment: 11 pages, 1 figure. This is a companion paper to arXiv:1001.546
Implementing the SCAN language by neural networks
The real-time encryption of pictures is an important subject for many applications, e.g. television broadcast stations, network security, etc. The paper shows how the previously introduced SCAN encryption method can be easily implemented using binary neural network autoassociative memory
Optimizing local protocols implementing nonlocal quantum gates
We present a method of optimizing recently designed protocols for
implementing an arbitrary nonlocal unitary gate acting on a bipartite system.
These protocols use only local operations and classical communication with the
assistance of entanglement, and are deterministic while also being "one-shot",
in that they use only one copy of an entangled resource state. The optimization
is in the sense of minimizing the amount of entanglement used, and it is often
the case that less entanglement is needed than with an alternative protocol
using two-way teleportation.Comment: 11 pages, 1 figure. This is a companion paper to arXiv:1001.546
- …