Search CORE

30,788 research outputs found

A bibliography on parallel and vector numerical algorithms

Author: Ortega J. M.
Voigt R. G.
Publication venue
Publication date
Field of study

This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also

NASA Technical Reports Server

Report from the MPP Working Group to the NASA Associate Administrator for Space Science and Applications

Author: Fischer James R.
Grosch Chester
Mcanulty Michael
Odonnell John
Storey Owen
Publication venue
Publication date
Field of study

NASA's Office of Space Science and Applications (OSSA) gave a select group of scientists the opportunity to test and implement their computational algorithms on the Massively Parallel Processor (MPP) located at Goddard Space Flight Center, beginning in late 1985. One year later, the Working Group presented its report, which addressed the following: algorithms, programming languages, architecture, programming environments, the way theory relates, and performance measured. The findings point to a number of demonstrated computational techniques for which the MPP architecture is ideally suited. For example, besides executing much faster on the MPP than on conventional computers, systolic VLSI simulation (where distances are short), lattice simulation, neural network simulation, and image problems were found to be easier to program on the MPP's architecture than on a CYBER 205 or even a VAX. The report also makes technical recommendations covering all aspects of MPP use, and recommendations concerning the future of the MPP and machines based on similar architectures, expansion of the Working Group, and study of the role of future parallel processors for space station, EOS, and the Great Observatories era

NASA Technical Reports Server

A linear algebra processor using Monte Carlo methods

Author: Alexandrov Vassil Nikolov
Cadenas Medina Jose Oswaldo
Megson Graham M
Plaks T P
Publication venue
Publication date: 11/09/2003
Field of study

Central Archive at the University of Reading

cphVB: A System for Automated Runtime Optimization and Parallelization of Vectorized Applications

Author: Blum Troels
Kristensen Mads Ruben Burgdorff
Lund Simon Andreas Frimann
Vinter Brian
Publication venue
Publication date: 01/01/2012
Field of study

Modern processor architectures, in addition to having still more cores, also require still more consideration to memory-layout in order to run at full capacity. The usefulness of most languages is deprecating as their abstractions, structures or objects are hard to map onto modern processor architectures efficiently. The work in this paper introduces a new abstract machine framework, cphVB, that enables vector oriented high-level programming languages to map onto a broad range of architectures efficiently. The idea is to close the gap between high-level languages and hardware optimized low-level implementations. By translating high-level vector operations into an intermediate vector bytecode, cphVB enables specialized vector engines to efficiently execute the vector operations. The primary success parameters are to maintain a complete abstraction from low-level details and to provide efficient code execution across different, modern, processors. We evaluate the presented design through a setup that targets multi-core CPU architectures. We evaluate the performance of the implementation using Python implementations of well-known algorithms: a jacobi solver, a kNN search, a shallow water simulation and a synthetic stencil simulation. All demonstrate good performance

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Computer Architectures to Close the Loop in Real-time Optimization

Author: Constantinides GA
Kerrigan EC
Khusainov B
Picciau A
Suardi A
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/07/2015
Field of study

© 2015 IEEE.Many modern control, automation, signal processing and machine learning applications rely on solving a sequence of optimization problems, which are updated with measurements of a real system that evolves in time. The solutions of each of these optimization problems are then used to make decisions, which may be followed by changing some parameters of the physical system, thereby resulting in a feedback loop between the computing and the physical system. Real-time optimization is not the same as fast optimization, due to the fact that the computation is affected by an uncertain system that evolves in time. The suitability of a design should therefore not be judged from the optimality of a single optimization problem, but based on the evolution of the entire cyber-physical system. The algorithms and hardware used for solving a single optimization problem in the office might therefore be far from ideal when solving a sequence of real-time optimization problems. Instead of there being a single, optimal design, one has to trade-off a number of objectives, including performance, robustness, energy usage, size and cost. We therefore provide here a tutorial introduction to some of the questions and implementation issues that arise in real-time optimization applications. We will concentrate on some of the decisions that have to be made when designing the computing architecture and algorithm and argue that the choice of one informs the other

Crossref

Spiral - Imperial College Digital Repository

Mixing multi-core CPUs and GPUs for scientific simulation software

Author: Hawick K.A.
Leist A.
Playne D.P.
Publication venue: 'Massey University'
Publication date: 01/01/2010
Field of study

Recent technological and economic developments have led to widespread availability of multi-core CPUs and specialist accelerator processors such as graphical processing units (GPUs). The accelerated computational performance possible from these devices can be very high for some applications paradigms. Software languages and systems such as NVIDIA's CUDA and Khronos consortium's open compute language (OpenCL) support a number of individual parallel application programming paradigms. To scale up the performance of some complex systems simulations, a hybrid of multi-core CPUs for coarse-grained parallelism and very many core GPUs for data parallelism is necessary. We describe our use of hybrid applica- tions using threading approaches and multi-core CPUs to control independent GPU devices. We present speed-up data and discuss multi-threading software issues for the applications level programmer and o er some suggested areas for language development and integration between coarse-grained and ne-grained multi-thread systems. We discuss results from three common simulation algorithmic areas including: partial di erential equations; graph cluster metric calculations and random number generation. We report on programming experiences and selected performance for these algorithms on: single and multiple GPUs; multi-core CPUs; a CellBE; and using OpenCL. We discuss programmer usability issues and the outlook and trends in multi-core programming for scienti c applications developers

Massey Research Online

Magnetic Cellular Nonlinear Network with Spin Wave Bus for Image Processing

Author: Alexander Khitun
Allwood
Bailleul
Bandyopadhyay
Chou
Chua
Covington
Covington
Cowburn
Eerenstein
Eschbach
Guolin
Hiebert
Jungho
Kang L. Wang
Karahaliloglu
Khitun
Khitun
Khitun
Khitun
Khitun
Kostylev
Lent
Likharev
Mancoff
Matsumoto
Mingqiang Bao
Sang-Koog
Schneider
Silva
Slonczewski
Tsoi
Turel
Turel
Van Den Boomgaard
Venetianer
Publication venue: 'Elsevier BV'
Publication date: 30/07/2009
Field of study

We describe and analyze a cellular nonlinear network based on magnetic nanostructures for image processing. The network consists of magneto-electric cells integrated onto a common ferromagnetic film - spin wave bus. The magneto-electric cell is an artificial two-phase multiferroic structure comprising piezoelectric and ferromagnetic materials. A bit of information is assigned to the cell's magnetic polarization, which can be controlled by the applied voltage. The information exchange among the cells is via the spin waves propagating in the spin wave bus. Each cell changes its state as a combined effect of two: the magneto-electric coupling and the interaction with the spin waves. The distinct feature of the network with spin wave bus is the ability to control the inter-cell communication by an external global parameter - magnetic field. The latter makes possible to realize different image processing functions on the same template without rewiring or reconfiguration. We present the results of numerical simulations illustrating image filtering, erosion, dilation, horizontal and vertical line detection, inversion and edge detection accomplished on one template by the proper choice of the strength and direction of the external magnetic field. We also present numerical assets on the major network parameters such as cell density, power dissipation and functional throughput, and compare them with the parameters projected for other nano-architectures such as CMOL-CrossNet, Quantum Dot Cellular Automata, and Quantum Dot Image Processor. Potentially, the utilization of spin waves phenomena at the nanometer scale may provide a route to low-power consuming and functional logic circuits for special task data processing

arXiv.org e-Print Archive

Crossref