76,920 research outputs found
On the impact of communication complexity in the design of parallel numerical algorithms
This paper describes two models of the cost of data movement in parallel numerical algorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In the second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm independent upper bounds on system performance are derived for several problems that are important to scientific computation
Research and Education in Computational Science and Engineering
Over the past two decades the field of computational science and engineering
(CSE) has penetrated both basic and applied research in academia, industry, and
laboratories to advance discovery, optimize systems, support decision-makers,
and educate the scientific and engineering workforce. Informed by centuries of
theory and experiment, CSE performs computational experiments to answer
questions that neither theory nor experiment alone is equipped to answer. CSE
provides scientists and engineers of all persuasions with algorithmic
inventions and software systems that transcend disciplines and scales. Carried
on a wave of digital technology, CSE brings the power of parallelism to bear on
troves of data. Mathematics-based advanced computing has become a prevalent
means of discovery and innovation in essentially all areas of science,
engineering, technology, and society; and the CSE community is at the core of
this transformation. However, a combination of disruptive
developments---including the architectural complexity of extreme-scale
computing, the data revolution that engulfs the planet, and the specialization
required to follow the applications to new frontiers---is redefining the scope
and reach of the CSE endeavor. This report describes the rapid expansion of CSE
and the challenges to sustaining its bold advances. The report also presents
strategies and directions for CSE research and education for the next decade.Comment: Major revision, to appear in SIAM Revie
Limits on Fundamental Limits to Computation
An indispensable part of our lives, computing has also become essential to
industries and governments. Steady improvements in computer hardware have been
supported by periodic doubling of transistor densities in integrated circuits
over the last fifty years. Such Moore scaling now requires increasingly heroic
efforts, stimulating research in alternative hardware and stirring controversy.
To help evaluate emerging technologies and enrich our understanding of
integrated-circuit scaling, we review fundamental limits to computation: in
manufacturing, energy, physical space, design and verification effort, and
algorithms. To outline what is achievable in principle and in practice, we
recall how some limits were circumvented, compare loose and tight limits. We
also point out that engineering difficulties encountered by emerging
technologies may indicate yet-unknown limits.Comment: 15 pages, 4 figures, 1 tabl
Trellis-Based Equalization for Sparse ISI Channels Revisited
Sparse intersymbol-interference (ISI) channels are encountered in a variety
of high-data-rate communication systems. Such channels have a large channel
memory length, but only a small number of significant channel coefficients. In
this paper, trellis-based equalization of sparse ISI channels is revisited. Due
to the large channel memory length, the complexity of maximum-likelihood
detection, e.g., by means of the Viterbi algorithm (VA), is normally
prohibitive. In the first part of the paper, a unified framework based on
factor graphs is presented for complexity reduction without loss of optimality.
In this new context, two known reduced-complexity algorithms for sparse ISI
channels are recapitulated: The multi-trellis VA (M-VA) and the
parallel-trellis VA (P-VA). It is shown that the M-VA, although claimed, does
not lead to a reduced computational complexity. The P-VA, on the other hand,
leads to a significant complexity reduction, but can only be applied for a
certain class of sparse channels. In the second part of the paper, a unified
approach is investigated to tackle general sparse channels: It is shown that
the use of a linear filter at the receiver renders the application of standard
reduced-state trellis-based equalizer algorithms feasible, without significant
loss of optimality. Numerical results verify the efficiency of the proposed
receiver structure.Comment: To be presented at the 2005 IEEE Int. Symp. Inform. Theory (ISIT
2005), September 4-9, 2005, Adelaide, Australi
Solution of partial differential equations on vector and parallel computers
The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed
Recommended from our members
Preparing sparse solvers for exascale computing.
Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
- …