Search CORE

1,063 research outputs found

Recommended from our members

Design and implementation of a parallel array operator for the arbitrary remapping of data.

Author: Chamberlain B. L. (Bradford L.)
Choi S. E. (Sung-Eun)
Dietz Steven
Snyder Lawrence
Publication venue: Los Alamos National Laboratory
Publication date: 01/01/2003
Field of study

The data redistribution or remapping functions, gather and scatter, are of long-standing in high-performance computing, having been included in Cray Fortran for decades. In this paper, we present a highly-general array operator with powerful ga.ther and scatter capa.bilities unmatched in other array languages. We discuss an efficient parallel implementation, introducing several new optimizations-run length encoding, dead army reuse, and direct conimunica.tion-that lessen the costs associa.ted with the operator's wide applicability. In our implementation of this operator in ZPL, we demonstrade comparable performance to the highly-tuned, hand-coded Fortran plus MPI versions of the NAS FT and NAS CG benchmarks

UNT Digital Library

Programmable remapper for image processing

Author: Juday Richard D.
Sampsell Jeffrey B.
Publication venue
Publication date: 19/11/1991
Field of study

A video-rate coordinate remapper includes a memory for storing a plurality of transformations on look-up tables for remapping input images from one coordinate system to another. Such transformations are operator selectable. The remapper includes a collective processor by which certain input pixels of an input image are transformed to a portion of the output image in a many-to-one relationship. The remapper includes an interpolative processor by which the remaining input pixels of the input image are transformed to another portion of the output image in a one-to-many relationship. The invention includes certain specific transforms for creating output images useful for certain defects of visually impaired people. The invention also includes means for shifting input pixels and means for scrolling the output matrix

NASA Technical Reports Server

MUSIC - Multisimulation Coordinator: Request For Comments

Author: &#xd6
Mikael Djurfeldt
Publication venue
Publication date: 24/04/2008
Field of study

MUSIC is an API allowing large scale neuron simulators using MPI internally to exchange data during runtime. MUSIC provides mechanisms to transfer massive amounts of event information and continuous values from one parallel application to another. Special care has been taken to ensure that existing simulators can be adapted to MUSIC. In particular, MUSIC handles data transfer between applications that use different time steps and different data allocation strategies. This RFC - Request For Comments - document invites comments on the proposed design and prototype specifications. &#xa

Crossref

Nature Precedings

Declarative Parallel Programming in Spreadsheet End-User Development:A Literature Review

Author: Biermann Florian
Publication venue
Publication date: 01/01/2016
Field of study

The IT University of Copenhagen's Repository

Optimal Compilation of HPF Remappings

Author: Ancourt Corinne
Coelho Fabien
Publication venue: 'Elsevier BV'
Publication date: 01/10/1996
Field of study

International audienceApplications with varying array access patterns require to dynamically change array mappings on distributed-memory parallel machines. HPF (High Performance Fortran) provides such remappings, on data that can be replicated, explicitly through therealign andredistribute directives and implicitly at procedure calls and returns. However such features are left out of the HPF subset or of the currently discussed hpf kernel for effeciency reasons. This paper presents a new compilation technique to handle hpf remappings for message-passing parallel architectures. The first phase is global and removes all useless remappings that appear naturally in procedures. The code generated by the second phase takes advantage of replications to shorten the remapping time. It is proved optimal: A minimal number of messages, containing only the required data, is sent over the network. The technique is fully implemented in HPFC, our prototype HPF compiler. Experiments were performed on a Dec Alpha farm

HAL-MINES ParisTech

Numerical simulation of friction welding processes: An arbitrary Lagrangian-Eulerian approach

Author: Hamed Maien Mohamed Osman
Publication venue: 'University of Babylon - Department of Mechanical Engineering, Faculty of Engineering'
Publication date: 29/08/2022
Field of study

The development and implementation of a finite strain thermo-viscoplasticity solver with thermomechanical friction contact for numerical simulation of friction welding processes are described. A finite strain associative coupled thermoplasticity model is used, which is suited for the large deformations characteristic of friction welding processes, and which resolves the viscoplastic deformations in the thermomechanically affected zone as well as the elastic stresses in the parent material. To prevent the large deformations from causing large distortions and degrading the simulation accuracy, an arbitrary Lagrangian Eulerian (ALE) formulation for coupled finite strain thermoplasticity is developed and incorporated into the solver, in which the motion of the reference configuration is represented incrementally in terms of a reference velocity field. Thus, the deformation from the material configuration is required neither explicitly in terms of a deformation field, nor implicitly in terms of the deformation gradient. The solver is implemented using the deal. II library and programmed for distributed memory parallel computing architectures, which reduces simulation run times and enables simulations with larger meshes than would fit on a single computer. The interprocess communications required in such a distributed memory parallel implementation of the ALE formulation and the thermomechanical friction contact are described and implemented. The axisymmetric solver implementation is validated with benchmark problems and used to simulate a direct drive friction welding process

Cape Town University OpenUCT

Layered architecture for quantum computing

Author: Alexei Yu. Kitaev
Andrew M. Steane
Andrew M. Steane
Austin G. Fowler
Austin G. Fowler
Austin G. Fowler
Christopher M. Dawson
D. Aharonov
Daniel A. Lidar
Dean Copsey
G. N. Nielson
John Paul Shen
Jungsang Kim
M. Oskin
M. Whitney
M. Whitney
Michael A. Nielsen
N. Isailovic
N. Isailovic
Panos Aliferis
Stéphane Beauregard
Thomas G. Draper
Tzvetan S. Metodi
Yasuhiro Takahashi
Publication venue: 'American Physical Society (APS)'
Publication date: 01/07/2012
Field of study

We develop a layered quantum computer architecture, which is a systematic framework for tackling the individual challenges of developing a quantum computer while constructing a cohesive device design. We discuss many of the prominent techniques for implementing circuit-model quantum computing and introduce several new methods, with an emphasis on employing surface code quantum error correction. In doing so, we propose a new quantum computer architecture based on optical control of quantum dots. The timescales of physical hardware operations and logical, error-corrected quantum gates differ by several orders of magnitude. By dividing functionality into layers, we can design and analyze subsystems independently, demonstrating the value of our layered architectural approach. Using this concrete hardware platform, we provide resource analysis for executing fault-tolerant quantum algorithms for integer factoring and quantum simulation, finding that the quantum dot architecture we study could solve such problems on the timescale of days.Comment: 27 pages, 20 figure

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Automatic Generation of Efficient Sparse Tensor Format Conversion Routines

Author: Abstractions
Anandkumar Animashree
Bader Brett W.
Bik Aart JC
Buluç Aydin
Elafrou A.
Katherine Yelick Im
Kincaid David R.
Kotlyar Vladimir
Monakov Alexander
Nandy Payal
Park Jongsoo
Pugh William
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/06/2020
Field of study

This paper shows how to generate code that efficiently converts sparse tensors between disparate storage formats (data layouts) such as CSR, DIA, ELL, and many others. We decompose sparse tensor conversion into three logical phases: coordinate remapping, analysis, and assembly. We then develop a language that precisely describes how different formats group together and order a tensor's nonzeros in memory. This lets a compiler emit code that performs complex remappings of nonzeros when converting between formats. We also develop a query language that can extract statistics about sparse tensors, and we show how to emit efficient analysis code that computes such queries. Finally, we define an abstract interface that captures how data structures for storing a tensor can be efficiently assembled given specific statistics about the tensor. Disparate formats can implement this common interface, thus letting a compiler emit optimized sparse tensor conversion code for arbitrary combinations of many formats without hard-coding for any specific combination. Our evaluation shows that the technique generates sparse tensor conversion routines with performance between 1.00 and 2.01

\times

that of hand-optimized versions in SPARSKIT and Intel MKL, two popular sparse linear algebra libraries. And by emitting code that avoids materializing temporaries, which both libraries need for many combinations of source and target formats, our technique outperforms those libraries by 1.78 to 4.01

\times

for CSC/COO to DIA/ELL conversion.Comment: Presented at PLDI 202

arXiv.org e-Print Archive

Crossref

DSpace@MIT