Search CORE

17,474 research outputs found

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment

Author: Camille Coti
Camille Coti
Camille Coti
Emmanuel Agullo
Emmanuel Agullo
Emmanuel Agullo
Jack Dongarra
Jack Dongarra
Jack Dongarra
Julien Langou
Julien Langou
Qr Fac
Thomas Herault
Thomas Herault
Thomas Herault
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/12/2009
Field of study

Previous studies have reported that common dense linear algebra operations do not achieve speed up by using multiple geographical sites of a computational grid. Because such operations are the building blocks of most scientific applications, conventional supercomputers are still strongly predominant in high-performance computing and the use of grids for speeding up large-scale scientific problems is limited to applications exhibiting parallelism at a higher level. We have identified two performance bottlenecks in the distributed memory algorithms implemented in ScaLAPACK, a state-of-the-art dense linear algebra library. First, because ScaLAPACK assumes a homogeneous communication network, the implementations of ScaLAPACK algorithms lack locality in their communication pattern. Second, the number of messages sent in the ScaLAPACK algorithms is significantly greater than other algorithms that trade flops for communication. In this paper, we present a new approach for computing a QR factorization -- one of the main dense linear algebra kernels -- of tall and skinny matrices in a grid computing environment that overcomes these two bottlenecks. Our contribution is to articulate a recently proposed algorithm (Communication Avoiding QR) with a topology-aware middleware (QCG-OMPI) in order to confine intensive communications (ScaLAPACK calls) within the different geographical sites. An experimental study conducted on the Grid'5000 platform shows that the resulting performance increases linearly with the number of geographical sites on large-scale problems (and is in particular consistently higher than ScaLAPACK's).Comment: Accepted at IPDPS10. (IEEE International Parallel & Distributed Processing Symposium 2010 in Atlanta, GA, USA.

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Rennes 1

QCDOC: A 10-teraflops scale computer for lattice QCD

Author: Chen D.
Christ N. H.
Cristian C.
Dong Z.
Gara A.
Garg K.
Joo B.
Kim C.
Levkova L.
Liao X.
Mawhinney R. D.
Ohta S.
Wettig T.
Publication venue: 'Elsevier BV'
Publication date: 17/08/2000
Field of study

The architecture of a new class of computers, optimized for lattice QCD calculations, is described. An individual node is based on a single integrated circuit containing a PowerPC 32-bit integer processor with a 1 Gflops 64-bit IEEE floating point unit, 4 Mbyte of memory, 8 Gbit/sec nearest-neighbor communications and additional control and diagnostic circuitry. The machine's name, QCDOC, derives from ``QCD On a Chip''.Comment: Lattice 2000 (machines) 8 pages, 4 figure

arXiv.org e-Print Archive

UNT Digital Library

CERN Document Server

First-principle molecular dynamics with ultrasoft pseudopotentials: parallel implementation and application to extended bio-inorganic system

Author: Car R.
De Angelis F.
Giannozzi P.
Publication venue: 'AIP Publishing'
Publication date: 21/11/2003
Field of study

We present a plane-wave ultrasoft pseudopotential implementation of first-principle molecular dynamics, which is well suited to model large molecular systems containing transition metal centers. We describe an efficient strategy for parallelization that includes special features to deal with the augmented charge in the contest of Vanderbilt's ultrasoft pseudopotentials. We also discuss a simple approach to model molecular systems with a net charge and/or large dipole/quadrupole moments. We present test applications to manganese and iron porphyrins representative of a large class of biologically relevant metallorganic systems. Our results show that accurate Density-Functional Theory calculations on systems with several hundred atoms are feasible with access to moderate computational resources.Comment: 29 pages, 4 Postscript figures, revtex

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Udine

Exploiting group symmetry in semidefinite programming relaxations of the quadratic assignment problem.

Author: Klerk E. de
Sotirov R.
Publication venue
Publication date
Field of study

Research Papers in Economics

Exploiting Group Symmetry in Semidefinite Programming Relaxations of the Quadratic Assignment Problem

Author: Klerk E. de
Sotirov R.
Publication venue
Publication date
Field of study

We consider semidefinite programming relaxations of the quadratic assignment problem, and show how to exploit group symmetry in the problem data. Thus we are able to compute the best known lower bounds for several instances of quadratic assignment problems from the problem library: [R.E. Burkard, S.E. Karisch, F. Rendl. QAPLIB — a quadratic assignment problem library. Journal on Global Optimization, 10: 291–403, 1997]. AMS classification: 90C22, 20Cxx, 70-08.quadratic assignment problem;semidefinite programming;group sym- metry

Research Papers in Economics

Inviwo -- A Visualization System with Usage Abstraction Levels

Author: Englund Rickard
Falk Martin
Hotz Ingrid
Jönsson Daniel
Kottravel Sathish
Ropinski Timo
Steneteg Peter
Sundén Erik
Ynnerman Anders
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/10/2019
Field of study

The complexity of today's visualization applications demands specific visualization systems tailored for the development of these applications. Frequently, such systems utilize levels of abstraction to improve the application development process, for instance by providing a data flow network editor. Unfortunately, these abstractions result in several issues, which need to be circumvented through an abstraction-centered system design. Often, a high level of abstraction hides low level details, which makes it difficult to directly access the underlying computing platform, which would be important to achieve an optimal performance. Therefore, we propose a layer structure developed for modern and sustainable visualization systems allowing developers to interact with all contained abstraction levels. We refer to this interaction capabilities as usage abstraction levels, since we target application developers with various levels of experience. We formulate the requirements for such a system, derive the desired architecture, and present how the concepts have been exemplary realized within the Inviwo visualization system. Furthermore, we address several specific challenges that arise during the realization of such a layered architecture, such as communication between different computing platforms, performance centered encapsulation, as well as layer-independent development by supporting cross layer documentation and debugging capabilities

arXiv.org e-Print Archive

Publikationer från Linköpings universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Empirical Evaluation of the Parallel Distribution Sweeping Framework on Multicore Architectures

Author: A. Aggarwal
D. Ajwani
J. Singler
J.L. Bentley
K. Mehlhorn
S. Kang
Publication venue
Publication date: 01/01/2013
Field of study

In this paper, we perform an empirical evaluation of the Parallel External Memory (PEM) model in the context of geometric problems. In particular, we implement the parallel distribution sweeping framework of Ajwani, Sitchinava and Zeh to solve batched 1-dimensional stabbing max problem. While modern processors consist of sophisticated memory systems (multiple levels of caches, set associativity, TLB, prefetching), we empirically show that algorithms designed in simple models, that focus on minimizing the I/O transfers between shared memory and single level cache, can lead to efficient software on current multicore architectures. Our implementation exhibits significantly fewer accesses to slow DRAM and, therefore, outperforms traditional approaches based on plane sweep and two-way divide and conquer.Comment: Longer version of ESA'13 pape

arXiv.org e-Print Archive

Crossref

Space Station communications and tracking systems modeling and RF link simulation

Author: Chie Chak M.
Lindsey William C.
Tsang Chit-Sang
Publication venue
Publication date
Field of study

In this final report, the effort spent on Space Station Communications and Tracking System Modeling and RF Link Simulation is described in detail. The effort is mainly divided into three parts: frequency division multiple access (FDMA) system simulation modeling and software implementation; a study on design and evaluation of a functional computerized RF link simulation/analysis system for Space Station; and a study on design and evaluation of simulation system architecture. This report documents the results of these studies. In addition, a separate User's Manual on Space Communications Simulation System (SCSS) (Version 1) documents the software developed for the Space Station FDMA communications system simulation. The final report, SCSS user's manual, and the software located in the NASA JSC system analysis division's VAX 750 computer together serve as the deliverables from LinCom for this project effort

NASA Technical Reports Server