17,474 research outputs found
QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment
Previous studies have reported that common dense linear algebra operations do
not achieve speed up by using multiple geographical sites of a computational
grid. Because such operations are the building blocks of most scientific
applications, conventional supercomputers are still strongly predominant in
high-performance computing and the use of grids for speeding up large-scale
scientific problems is limited to applications exhibiting parallelism at a
higher level. We have identified two performance bottlenecks in the distributed
memory algorithms implemented in ScaLAPACK, a state-of-the-art dense linear
algebra library. First, because ScaLAPACK assumes a homogeneous communication
network, the implementations of ScaLAPACK algorithms lack locality in their
communication pattern. Second, the number of messages sent in the ScaLAPACK
algorithms is significantly greater than other algorithms that trade flops for
communication. In this paper, we present a new approach for computing a QR
factorization -- one of the main dense linear algebra kernels -- of tall and
skinny matrices in a grid computing environment that overcomes these two
bottlenecks. Our contribution is to articulate a recently proposed algorithm
(Communication Avoiding QR) with a topology-aware middleware (QCG-OMPI) in
order to confine intensive communications (ScaLAPACK calls) within the
different geographical sites. An experimental study conducted on the Grid'5000
platform shows that the resulting performance increases linearly with the
number of geographical sites on large-scale problems (and is in particular
consistently higher than ScaLAPACK's).Comment: Accepted at IPDPS10. (IEEE International Parallel & Distributed
Processing Symposium 2010 in Atlanta, GA, USA.
QCDOC: A 10-teraflops scale computer for lattice QCD
The architecture of a new class of computers, optimized for lattice QCD
calculations, is described. An individual node is based on a single integrated
circuit containing a PowerPC 32-bit integer processor with a 1 Gflops 64-bit
IEEE floating point unit, 4 Mbyte of memory, 8 Gbit/sec nearest-neighbor
communications and additional control and diagnostic circuitry. The machine's
name, QCDOC, derives from ``QCD On a Chip''.Comment: Lattice 2000 (machines) 8 pages, 4 figure
First-principle molecular dynamics with ultrasoft pseudopotentials: parallel implementation and application to extended bio-inorganic system
We present a plane-wave ultrasoft pseudopotential implementation of
first-principle molecular dynamics, which is well suited to model large
molecular systems containing transition metal centers. We describe an efficient
strategy for parallelization that includes special features to deal with the
augmented charge in the contest of Vanderbilt's ultrasoft pseudopotentials. We
also discuss a simple approach to model molecular systems with a net charge
and/or large dipole/quadrupole moments. We present test applications to
manganese and iron porphyrins representative of a large class of biologically
relevant metallorganic systems. Our results show that accurate
Density-Functional Theory calculations on systems with several hundred atoms
are feasible with access to moderate computational resources.Comment: 29 pages, 4 Postscript figures, revtex
Exploiting Group Symmetry in Semidefinite Programming Relaxations of the Quadratic Assignment Problem
We consider semidefinite programming relaxations of the quadratic assignment problem, and show how to exploit group symmetry in the problem data. Thus we are able to compute the best known lower bounds for several instances of quadratic assignment problems from the problem library: [R.E. Burkard, S.E. Karisch, F. Rendl. QAPLIB — a quadratic assignment problem library. Journal on Global Optimization, 10: 291–403, 1997]. AMS classification: 90C22, 20Cxx, 70-08.quadratic assignment problem;semidefinite programming;group sym- metry
Inviwo -- A Visualization System with Usage Abstraction Levels
The complexity of today's visualization applications demands specific
visualization systems tailored for the development of these applications.
Frequently, such systems utilize levels of abstraction to improve the
application development process, for instance by providing a data flow network
editor. Unfortunately, these abstractions result in several issues, which need
to be circumvented through an abstraction-centered system design. Often, a high
level of abstraction hides low level details, which makes it difficult to
directly access the underlying computing platform, which would be important to
achieve an optimal performance. Therefore, we propose a layer structure
developed for modern and sustainable visualization systems allowing developers
to interact with all contained abstraction levels. We refer to this interaction
capabilities as usage abstraction levels, since we target application
developers with various levels of experience. We formulate the requirements for
such a system, derive the desired architecture, and present how the concepts
have been exemplary realized within the Inviwo visualization system.
Furthermore, we address several specific challenges that arise during the
realization of such a layered architecture, such as communication between
different computing platforms, performance centered encapsulation, as well as
layer-independent development by supporting cross layer documentation and
debugging capabilities
Empirical Evaluation of the Parallel Distribution Sweeping Framework on Multicore Architectures
In this paper, we perform an empirical evaluation of the Parallel External
Memory (PEM) model in the context of geometric problems. In particular, we
implement the parallel distribution sweeping framework of Ajwani, Sitchinava
and Zeh to solve batched 1-dimensional stabbing max problem. While modern
processors consist of sophisticated memory systems (multiple levels of caches,
set associativity, TLB, prefetching), we empirically show that algorithms
designed in simple models, that focus on minimizing the I/O transfers between
shared memory and single level cache, can lead to efficient software on current
multicore architectures. Our implementation exhibits significantly fewer
accesses to slow DRAM and, therefore, outperforms traditional approaches based
on plane sweep and two-way divide and conquer.Comment: Longer version of ESA'13 pape
Space Station communications and tracking systems modeling and RF link simulation
In this final report, the effort spent on Space Station Communications and Tracking System Modeling and RF Link Simulation is described in detail. The effort is mainly divided into three parts: frequency division multiple access (FDMA) system simulation modeling and software implementation; a study on design and evaluation of a functional computerized RF link simulation/analysis system for Space Station; and a study on design and evaluation of simulation system architecture. This report documents the results of these studies. In addition, a separate User's Manual on Space Communications Simulation System (SCSS) (Version 1) documents the software developed for the Space Station FDMA communications system simulation. The final report, SCSS user's manual, and the software located in the NASA JSC system analysis division's VAX 750 computer together serve as the deliverables from LinCom for this project effort
- …