Search CORE

9,973 research outputs found

Generating optimized Fourier interpolation routines for density function theory using SPIRAL

Author: Franchetti F
Kelly PHJ
Popovici T
Russell FP
Skylaris CK
Wilkinson KA
Publication venue
Publication date: 12/12/2014
Field of study

© 2015 IEEE.Upsampling of a multi-dimensional data-set is an operation with wide application in image processing and quantum mechanical calculations using density functional theory. For small up sampling factors as seen in the quantum chemistry code ONETEP, a time-shift based implementation that shifts samples by a fraction of the original grid spacing to fill in the intermediate values using a frequency domain Fourier property can be a good choice. Readily available highly optimized multidimensional FFT implementations are leveraged at the expense of extra passes through the entire working set. In this paper we present an optimized variant of the time-shift based up sampling. Since ONETEP handles threading, we address the memory hierarchy and SIMD vectorization, and focus on problem dimensions relevant for ONETEP. We present a formalization of this operation within the SPIRAL framework and demonstrate auto-generated and auto-tuned interpolation libraries. We compare the performance of our generated code against the previous best implementations using highly optimized FFT libraries (FFTW and MKL). We demonstrate speed-ups in isolation averaging 3x and within ONETEP of up to 15%

Spiral - Imperial College Digital Repository

A fast solver for linear systems with displacement structure

Author: Arico' Antonio
Rodriguez Giuseppe
Publication venue
Publication date: 01/01/2010
Field of study

We describe a fast solver for linear systems with reconstructable Cauchy-like structure, which requires O(rn^2) floating point operations and O(rn) memory locations, where n is the size of the matrix and r its displacement rank. The solver is based on the application of the generalized Schur algorithm to a suitable augmented matrix, under some assumptions on the knots of the Cauchy-like matrix. It includes various pivoting strategies, already discussed in the literature, and a new algorithm, which only requires reconstructability. We have developed a software package, written in Matlab and C-MEX, which provides a robust implementation of the above method. Our package also includes solvers for Toeplitz(+Hankel)-like and Vandermonde-like linear systems, as these structures can be reduced to Cauchy-like by fast and stable transforms. Numerical experiments demonstrate the effectiveness of the software.Comment: 27 pages, 6 figure

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Cagliari

A bibliography on parallel and vector numerical algorithms

Author: Ortega J. M.
Voigt R. G.
Publication venue
Publication date
Field of study

This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also

NASA Technical Reports Server

Radio Astronomy Image Reconstruction in the Big Data Era

Author: Pratley Luke
Publication venue: UCL (University College London)
Publication date: 28/09/2019
Field of study

Next generation radio interferometric telescopes pave the way for the future of radio astronomy with extremely wide-fields of view and precision polarimetry not possible at other optical wavelengths, with the required cost of image reconstruction. These instruments will be used to map large scale Galactic and extra-galactic structures at higher resolution and fidelity than ever before. However, radio astronomy has entered the era of big data, limiting the expected sensitivity and fidelity of the instruments due to the large amounts of data. New image reconstruction methods are critical to meet the data requirements needed to obtain new scientific discoveries in radio astronomy. To meet this need, this work takes traditional radio astronomical imaging and introduces new of state-of-the-art image reconstruction frameworks of sparse image reconstruction algorithms. The software package PURIFY, developed in this work, uses convex optimization algorithms (i.e. alternating direction method of multipliers) to solve for the reconstructed image. We design, implement, and apply distributed radio interferometric image reconstruction methods for the message passing interface (MPI), showing that PURIFY scales to big data image reconstruction on computing clusters. We design a distributed wide-field imaging algorithm for non-coplanar arrays, while providing new theoretical insights for wide-field imaging. It is shown that PURIFY’s methods provide higher dynamic range than traditional image reconstruction methods, providing a more accurate and detailed sky model for real observations. This sets the stage for state-of-the-art image reconstruction methods to be distributed and applied to next generation interferometric telescopes, where they can be used to meet big data challenges and to make new scientific discoveries in radio astronomy and astrophysics

UCL Discovery

Extending OmpSs for OpenCL kernel co-execution in heterogeneous systems

Author: Ayguadé Parra Eduard
Beivide Palacio Ramon
Bosque Jose L.
Martorell Bofill Xavier
Mateo Sergi
Pérez Borja
Stafford Esteban
Teruel Xavier
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

© 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Heterogeneous systems have a very high potential performance but present difficulties in their programming. OmpSs is a well known framework for task based parallel applications, which is an interesting tool to simplify the programming of these systems. However, it does not support the co-execution of a single OpenCL kernel instance on several compute devices. To overcome this limitation, this paper presents an extension of the OmpSs framework that solves two main objectives: the automatic division of datasets among several devices and the management of their memory address spaces. To adapt to different kinds of applications, the data division can be performed by the novel HGuided load balancing algorithm or by the well known Static and Dynamic. All this is accomplished with negligible impact on the programming. Experimental results reveal that there is always one load balancing algorithm that improves the performance and energy consumption of the system.This work has been supported by the University of Cantabria with grant CVE-2014-18166, the Generalitat de Catalunya under grant 2014-SGR-1051, the Spanish Ministry of Economy, Industry and Competitiveness under contracts TIN2016- 76635-C2-2-R (AEI/FEDER, UE) and TIN2015-65316-P. The Spanish Government through the Programa Severo Ochoa (SEV-2015-0493). The European Research Council under grant agreement No 321253 European Community’s Seventh Framework Programme [FP7/2007-2013] and Horizon 2020 under the Mont-Blanc Projects, grant agreement n 288777, 610402 and 671697 and the European HiPEAC Network.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Remembering Forward: Neural Correlates of Memory and Prediction in Human Motor Adaptation

Author: Houk James
Mosier Kristine M.
Salowitz Nicole M.G.
Scheidt Robert A.
Simo Lucia
Suminski Aaron J.
Zimbelman Janice
Publication venue: e-Publications@Marquette
Publication date: 01/01/2012
Field of study

We used functional MR imaging (FMRI), a robotic manipulandum and systems identification techniques to examine neural correlates of predictive compensation for spring-like loads during goal-directed wrist movements in neurologically-intact humans. Although load changed unpredictably from one trial to the next, subjects nevertheless used sensorimotor memories from recent movements to predict and compensate upcoming loads. Prediction enabled subjects to adapt performance so that the task was accomplished with minimum effort. Population analyses of functional images revealed a distributed, bilateral network of cortical and subcortical activity supporting predictive load compensation during visual target capture. Cortical regions – including prefrontal, parietal and hippocampal cortices – exhibited trial-by-trial fluctuations in BOLD signal consistent with the storage and recall of sensorimotor memories or “states” important for spatial working memory. Bilateral activations in associative regions of the striatum demonstrated temporal correlation with the magnitude of kinematic performance error (a signal that could drive reward-optimizing reinforcement learning and the prospective scaling of previously learned motor programs). BOLD signal correlations with load prediction were observed in the cerebellar cortex and red nuclei (consistent with the idea that these structures generate adaptive fusimotor signals facilitating cancelation of expected proprioceptive feedback, as required for conditional feedback adjustments to ongoing motor commands and feedback error learning). Analysis of single subject images revealed that predictive activity was at least as likely to be observed in more than one of these neural systems as in just one. We conclude therefore that motor adaptation is mediated by predictive compensations supported by multiple, distributed, cortical and subcortical structures

epublications@Marquette

PubMed Central

Evaluating Component Assembly Specialization for 3D FFT

Author: Christian Perez
Publication venue
Publication date
Field of study

The Fast Fourier Transform (FFT) is a widely-used building block for many high-performance scienti c applications. Ef- cient computing of FFT is paramount for the performance of these applications. This has led to many e orts to implement machine and computation speci c optimizations. However, no existing FFT library is capable of easily integrating and au- tomating the selection of new and/or unique optimizations. To ease FFT specialization, this paper evaluates the use of component-based software engineering, a programming paradigm which consists in building applications by assembling small software units. Component models are known to have many software engineering bene ts but usually have insucient performance for high-performance scienti c applications. This paper uses the L2C model, a general purpose high-performance component model, and studies its performance and adaptation capabilities on 3D FFTs. Experiments show that L2C, and components in general, enables easy handling of 3D FFT specializations while obtaining performance comparable to that of well-known libraries. However, a higher-level component model is needed to automatically generate an adequate L2C assembly

ZENODO

Report from the MPP Working Group to the NASA Associate Administrator for Space Science and Applications

Author: Fischer James R.
Grosch Chester
Mcanulty Michael
Odonnell John
Storey Owen
Publication venue
Publication date
Field of study

NASA's Office of Space Science and Applications (OSSA) gave a select group of scientists the opportunity to test and implement their computational algorithms on the Massively Parallel Processor (MPP) located at Goddard Space Flight Center, beginning in late 1985. One year later, the Working Group presented its report, which addressed the following: algorithms, programming languages, architecture, programming environments, the way theory relates, and performance measured. The findings point to a number of demonstrated computational techniques for which the MPP architecture is ideally suited. For example, besides executing much faster on the MPP than on conventional computers, systolic VLSI simulation (where distances are short), lattice simulation, neural network simulation, and image problems were found to be easier to program on the MPP's architecture than on a CYBER 205 or even a VAX. The report also makes technical recommendations covering all aspects of MPP use, and recommendations concerning the future of the MPP and machines based on similar architectures, expansion of the Working Group, and study of the role of future parallel processors for space station, EOS, and the Great Observatories era

NASA Technical Reports Server

Solution of partial differential equations on vector and parallel computers

Author: Ortega J. M.
Voigt R. G.
Publication venue
Publication date
Field of study

The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed

NASA Technical Reports Server