56,228 research outputs found
Efficient Parallel Algorithm for Robot Forward Dynamics Computation
Computing the robot forward dynamics is important for real-time computer simulation of robot arm motion. Two efficient parallel algorithms for computing the forward dynamics for real-time simulation were developed to be implemented on an SIMD computer with n. processors, where h is the number of degrees-of-freedom of the manipulator. The first parallel algorithm, based on the Composite Rigid-Body method, generates the inertia matrix using the parallel Newton-Euler algorithm, the parallel linear recurrence algorithm, and the row-sweep algorithm, and then inverts the inertia matrix to obtain the joint acceleration vector desired at time t. The time complexity of this parallel algorithm is of the order 0(n2) with 0(n) processors. Further reduction of the order of time complexity can be achieved by implementing the Cholesky’s factorization procedure on array processors. The second parallel algorithm, based on the conjugate gradient method, computes the joint accelerations with a time complexity of 0(n) for multiplication operation and 0(nlogn) for addition operation. The proposed parallel computation results are compared with the existing methods
Exact Sparse Matrix-Vector Multiplication on GPU's and Multicore Architectures
We propose different implementations of the sparse matrix--dense vector
multiplication (\spmv{}) for finite fields and rings \Zb/m\Zb. We take
advantage of graphic card processors (GPU) and multi-core architectures. Our
aim is to improve the speed of \spmv{} in the \linbox library, and henceforth
the speed of its black box algorithms. Besides, we use this and a new
parallelization of the sigma-basis algorithm in a parallel block Wiedemann rank
implementation over finite fields
Collaborative Computation in Self-Organizing Particle Systems
Many forms of programmable matter have been proposed for various tasks. We
use an abstract model of self-organizing particle systems for programmable
matter which could be used for a variety of applications, including smart paint
and coating materials for engineering or programmable cells for medical uses.
Previous research using this model has focused on shape formation and other
spatial configuration problems (e.g., coating and compression). In this work we
study foundational computational tasks that exceed the capabilities of the
individual constant size memory of a particle, such as implementing a counter
and matrix-vector multiplication. These tasks represent new ways to use these
self-organizing systems, which, in conjunction with previous shape and
configuration work, make the systems useful for a wider variety of tasks. They
can also leverage the distributed and dynamic nature of the self-organizing
system to be more efficient and adaptable than on traditional linear computing
hardware. Finally, we demonstrate applications of similar types of computations
with self-organizing systems to image processing, with implementations of image
color transformation and edge detection algorithms
Speculative Segmented Sum for Sparse Matrix-Vector Multiplication on Heterogeneous Processors
Sparse matrix-vector multiplication (SpMV) is a central building block for
scientific software and graph applications. Recently, heterogeneous processors
composed of different types of cores attracted much attention because of their
flexible core configuration and high energy efficiency. In this paper, we
propose a compressed sparse row (CSR) format based SpMV algorithm utilizing
both types of cores in a CPU-GPU heterogeneous processor. We first
speculatively execute segmented sum operations on the GPU part of a
heterogeneous processor and generate a possibly incorrect results. Then the CPU
part of the same chip is triggered to re-arrange the predicted partial sums for
a correct resulting vector. On three heterogeneous processors from Intel, AMD
and nVidia, using 20 sparse matrices as a benchmark suite, the experimental
results show that our method obtains significant performance improvement over
the best existing CSR-based SpMV algorithms. The source code of this work is
downloadable at https://github.com/bhSPARSE/Benchmark_SpMV_using_CSRComment: 22 pages, 8 figures, Published at Parallel Computing (PARCO
GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU
High-performance implementations of graph algorithms are challenging to
implement on new parallel hardware such as GPUs because of three challenges:
(1) the difficulty of coming up with graph building blocks, (2) load imbalance
on parallel hardware, and (3) graph problems having low arithmetic intensity.
To address some of these challenges, GraphBLAS is an innovative, on-going
effort by the graph analytics community to propose building blocks based on
sparse linear algebra, which will allow graph algorithms to be expressed in a
performant, succinct, composable and portable manner. In this paper, we examine
the performance challenges of a linear-algebra-based approach to building graph
frameworks and describe new design principles for overcoming these bottlenecks.
Among the new design principles is exploiting input sparsity, which allows
users to write graph algorithms without specifying push and pull direction.
Exploiting output sparsity allows users to tell the backend which values of the
output in a single vectorized computation they do not want computed.
Load-balancing is an important feature for balancing work amongst parallel
workers. We describe the important load-balancing features for handling graphs
with different characteristics. The design principles described in this paper
have been implemented in "GraphBLAST", the first high-performance linear
algebra-based graph framework on NVIDIA GPUs that is open-source. The results
show that on a single GPU, GraphBLAST has on average at least an order of
magnitude speedup over previous GraphBLAS implementations SuiteSparse and GBTL,
comparable performance to the fastest GPU hardwired primitives and
shared-memory graph frameworks Ligra and Gunrock, and better performance than
any other GPU graph framework, while offering a simpler and more concise
programming model.Comment: 50 pages, 14 figures, 14 table
- …