30,788 research outputs found
A bibliography on parallel and vector numerical algorithms
This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also
Report from the MPP Working Group to the NASA Associate Administrator for Space Science and Applications
NASA's Office of Space Science and Applications (OSSA) gave a select group of scientists the opportunity to test and implement their computational algorithms on the Massively Parallel Processor (MPP) located at Goddard Space Flight Center, beginning in late 1985. One year later, the Working Group presented its report, which addressed the following: algorithms, programming languages, architecture, programming environments, the way theory relates, and performance measured. The findings point to a number of demonstrated computational techniques for which the MPP architecture is ideally suited. For example, besides executing much faster on the MPP than on conventional computers, systolic VLSI simulation (where distances are short), lattice simulation, neural network simulation, and image problems were found to be easier to program on the MPP's architecture than on a CYBER 205 or even a VAX. The report also makes technical recommendations covering all aspects of MPP use, and recommendations concerning the future of the MPP and machines based on similar architectures, expansion of the Working Group, and study of the role of future parallel processors for space station, EOS, and the Great Observatories era
cphVB: A System for Automated Runtime Optimization and Parallelization of Vectorized Applications
Modern processor architectures, in addition to having still more cores, also
require still more consideration to memory-layout in order to run at full
capacity. The usefulness of most languages is deprecating as their
abstractions, structures or objects are hard to map onto modern processor
architectures efficiently.
The work in this paper introduces a new abstract machine framework, cphVB,
that enables vector oriented high-level programming languages to map onto a
broad range of architectures efficiently. The idea is to close the gap between
high-level languages and hardware optimized low-level implementations. By
translating high-level vector operations into an intermediate vector bytecode,
cphVB enables specialized vector engines to efficiently execute the vector
operations.
The primary success parameters are to maintain a complete abstraction from
low-level details and to provide efficient code execution across different,
modern, processors. We evaluate the presented design through a setup that
targets multi-core CPU architectures. We evaluate the performance of the
implementation using Python implementations of well-known algorithms: a jacobi
solver, a kNN search, a shallow water simulation and a synthetic stencil
simulation. All demonstrate good performance
Computer Architectures to Close the Loop in Real-time Optimization
© 2015 IEEE.Many modern control, automation, signal processing and machine learning applications rely on solving a sequence of optimization problems, which are updated with measurements of a real system that evolves in time. The solutions of each of these optimization problems are then used to make decisions, which may be followed by changing some parameters of the physical system, thereby resulting in a feedback loop between the computing and the physical system. Real-time optimization is not the same as fast optimization, due to the fact that the computation is affected by an uncertain system that evolves in time. The suitability of a design should therefore not be judged from the optimality of a single optimization problem, but based on the evolution of the entire cyber-physical system. The algorithms and hardware used for solving a single optimization problem in the office might therefore be far from ideal when solving a sequence of real-time optimization problems. Instead of there being a single, optimal design, one has to trade-off a number of objectives, including performance, robustness, energy usage, size and cost. We therefore provide here a tutorial introduction to some of the questions and implementation issues that arise in real-time optimization applications. We will concentrate on some of the decisions that have to be made when designing the computing architecture and algorithm and argue that the choice of one informs the other
Mixing multi-core CPUs and GPUs for scientific simulation software
Recent technological and economic developments have led to widespread availability of
multi-core CPUs and specialist accelerator processors such as graphical processing units
(GPUs). The accelerated computational performance possible from these devices can be very
high for some applications paradigms. Software languages and systems such as NVIDIA's
CUDA and Khronos consortium's open compute language (OpenCL) support a number of
individual parallel application programming paradigms. To scale up the performance of some
complex systems simulations, a hybrid of multi-core CPUs for coarse-grained parallelism and
very many core GPUs for data parallelism is necessary. We describe our use of hybrid applica-
tions using threading approaches and multi-core CPUs to control independent GPU devices.
We present speed-up data and discuss multi-threading software issues for the applications
level programmer and o er some suggested areas for language development and integration
between coarse-grained and ne-grained multi-thread systems. We discuss results from three
common simulation algorithmic areas including: partial di erential equations; graph cluster
metric calculations and random number generation. We report on programming experiences
and selected performance for these algorithms on: single and multiple GPUs; multi-core CPUs;
a CellBE; and using OpenCL. We discuss programmer usability issues and the outlook and
trends in multi-core programming for scienti c applications developers
Magnetic Cellular Nonlinear Network with Spin Wave Bus for Image Processing
We describe and analyze a cellular nonlinear network based on magnetic
nanostructures for image processing. The network consists of magneto-electric
cells integrated onto a common ferromagnetic film - spin wave bus. The
magneto-electric cell is an artificial two-phase multiferroic structure
comprising piezoelectric and ferromagnetic materials. A bit of information is
assigned to the cell's magnetic polarization, which can be controlled by the
applied voltage. The information exchange among the cells is via the spin waves
propagating in the spin wave bus. Each cell changes its state as a combined
effect of two: the magneto-electric coupling and the interaction with the spin
waves. The distinct feature of the network with spin wave bus is the ability to
control the inter-cell communication by an external global parameter - magnetic
field. The latter makes possible to realize different image processing
functions on the same template without rewiring or reconfiguration. We present
the results of numerical simulations illustrating image filtering, erosion,
dilation, horizontal and vertical line detection, inversion and edge detection
accomplished on one template by the proper choice of the strength and direction
of the external magnetic field. We also present numerical assets on the major
network parameters such as cell density, power dissipation and functional
throughput, and compare them with the parameters projected for other
nano-architectures such as CMOL-CrossNet, Quantum Dot Cellular Automata, and
Quantum Dot Image Processor. Potentially, the utilization of spin waves
phenomena at the nanometer scale may provide a route to low-power consuming and
functional logic circuits for special task data processing
- …