26,185 research outputs found
GPU in Physics Computation: Case Geant4 Navigation
General purpose computing on graphic processing units (GPU) is a potential
method of speeding up scientific computation with low cost and high energy
efficiency. We experimented with the particle physics simulation toolkit Geant4
used at CERN to benchmark its geometry navigation functionality on a GPU. The
goal was to find out whether Geant4 physics simulations could benefit from GPU
acceleration and how difficult it is to modify Geant4 code to run in a GPU.
We ported selected parts of Geant4 code to C99 & CUDA and implemented a
simple gamma physics simulation utilizing this code to measure efficiency. The
performance of the program was tested by running it on two different platforms:
NVIDIA GeForce 470 GTX GPU and a 12-core AMD CPU system. Our conclusion was
that GPUs can be a competitive alternate for multi-core computers but porting
existing software in an efficient way is challenging
Pattern classification using a linear associative memory
Pattern classification is a very important image processing task. A typical pattern classification algorithm can be broken into two parts; first, the pattern features are extracted and, second, these features are compared with a stored set of reference features until a match is found. In the second part, usually one of the several clustering algorithms or similarity measures is applied. In this paper, a new application of linear associative memory (LAM) to pattern classification problems is introduced. Here, the clustering algorithms or similarity measures are replaced by a LAM matrix multiplication. With a LAM, the reference features need not be separately stored. Since the second part of most classification algorithms is similar, a LAM standardizes the many clustering algorithms and also allows for a standard digital hardware implementation. Computer simulations on regular textures using a feature extraction algorithm achieved a high percentage of successful classification. In addition, this classification is independent of topological transformations
Automated sequence and motion planning for robotic spatial extrusion of 3D trusses
While robotic spatial extrusion has demonstrated a new and efficient means to
fabricate 3D truss structures in architectural scale, a major challenge remains
in automatically planning extrusion sequence and robotic motion for trusses
with unconstrained topologies. This paper presents the first attempt in the
field to rigorously formulate the extrusion sequence and motion planning (SAMP)
problem, using a CSP encoding. Furthermore, this research proposes a new
hierarchical planning framework to solve the extrusion SAMP problems that
usually have a long planning horizon and 3D configuration complexity. By
decoupling sequence and motion planning, the planning framework is able to
efficiently solve the extrusion sequence, end-effector poses, joint
configurations, and transition trajectories for spatial trusses with
nonstandard topologies. This paper also presents the first detailed computation
data to reveal the runtime bottleneck on solving SAMP problems, which provides
insight and comparing baseline for future algorithmic development. Together
with the algorithmic results, this paper also presents an open-source and
modularized software implementation called Choreo that is machine-agnostic. To
demonstrate the power of this algorithmic framework, three case studies,
including real fabrication and simulation results, are presented.Comment: 24 pages, 16 figure
Parallelizing RRT on large-scale distributed-memory architectures
This paper addresses the problem of parallelizing the Rapidly-exploring Random Tree (RRT) algorithm on large-scale distributed-memory architectures, using the Message Passing Interface. We compare three parallel versions of RRT based on classical parallelization schemes. We evaluate them on different motion planning problems and analyze the various factors influencing their performance
A sparse octree gravitational N-body code that runs entirely on the GPU processor
We present parallel algorithms for constructing and traversing sparse octrees
on graphics processing units (GPUs). The algorithms are based on parallel-scan
and sort methods. To test the performance and feasibility, we implemented them
in CUDA in the form of a gravitational tree-code which completely runs on the
GPU.(The code is publicly available at:
http://castle.strw.leidenuniv.nl/software.html) The tree construction and
traverse algorithms are portable to many-core devices which have support for
CUDA or OpenCL programming languages. The gravitational tree-code outperforms
tuned CPU code during the tree-construction and shows a performance improvement
of more than a factor 20 overall, resulting in a processing rate of more than
2.8 million particles per second.Comment: Accepted version. Published in Journal of Computational Physics. 35
pages, 12 figures, single colum
- âŠ