26,185 research outputs found

    GPU in Physics Computation: Case Geant4 Navigation

    Full text link
    General purpose computing on graphic processing units (GPU) is a potential method of speeding up scientific computation with low cost and high energy efficiency. We experimented with the particle physics simulation toolkit Geant4 used at CERN to benchmark its geometry navigation functionality on a GPU. The goal was to find out whether Geant4 physics simulations could benefit from GPU acceleration and how difficult it is to modify Geant4 code to run in a GPU. We ported selected parts of Geant4 code to C99 & CUDA and implemented a simple gamma physics simulation utilizing this code to measure efficiency. The performance of the program was tested by running it on two different platforms: NVIDIA GeForce 470 GTX GPU and a 12-core AMD CPU system. Our conclusion was that GPUs can be a competitive alternate for multi-core computers but porting existing software in an efficient way is challenging

    Pattern classification using a linear associative memory

    Get PDF
    Pattern classification is a very important image processing task. A typical pattern classification algorithm can be broken into two parts; first, the pattern features are extracted and, second, these features are compared with a stored set of reference features until a match is found. In the second part, usually one of the several clustering algorithms or similarity measures is applied. In this paper, a new application of linear associative memory (LAM) to pattern classification problems is introduced. Here, the clustering algorithms or similarity measures are replaced by a LAM matrix multiplication. With a LAM, the reference features need not be separately stored. Since the second part of most classification algorithms is similar, a LAM standardizes the many clustering algorithms and also allows for a standard digital hardware implementation. Computer simulations on regular textures using a feature extraction algorithm achieved a high percentage of successful classification. In addition, this classification is independent of topological transformations

    Automated sequence and motion planning for robotic spatial extrusion of 3D trusses

    Full text link
    While robotic spatial extrusion has demonstrated a new and efficient means to fabricate 3D truss structures in architectural scale, a major challenge remains in automatically planning extrusion sequence and robotic motion for trusses with unconstrained topologies. This paper presents the first attempt in the field to rigorously formulate the extrusion sequence and motion planning (SAMP) problem, using a CSP encoding. Furthermore, this research proposes a new hierarchical planning framework to solve the extrusion SAMP problems that usually have a long planning horizon and 3D configuration complexity. By decoupling sequence and motion planning, the planning framework is able to efficiently solve the extrusion sequence, end-effector poses, joint configurations, and transition trajectories for spatial trusses with nonstandard topologies. This paper also presents the first detailed computation data to reveal the runtime bottleneck on solving SAMP problems, which provides insight and comparing baseline for future algorithmic development. Together with the algorithmic results, this paper also presents an open-source and modularized software implementation called Choreo that is machine-agnostic. To demonstrate the power of this algorithmic framework, three case studies, including real fabrication and simulation results, are presented.Comment: 24 pages, 16 figure

    Parallelizing RRT on large-scale distributed-memory architectures

    Get PDF
    This paper addresses the problem of parallelizing the Rapidly-exploring Random Tree (RRT) algorithm on large-scale distributed-memory architectures, using the Message Passing Interface. We compare three parallel versions of RRT based on classical parallelization schemes. We evaluate them on different motion planning problems and analyze the various factors influencing their performance

    A sparse octree gravitational N-body code that runs entirely on the GPU processor

    Get PDF
    We present parallel algorithms for constructing and traversing sparse octrees on graphics processing units (GPUs). The algorithms are based on parallel-scan and sort methods. To test the performance and feasibility, we implemented them in CUDA in the form of a gravitational tree-code which completely runs on the GPU.(The code is publicly available at: http://castle.strw.leidenuniv.nl/software.html) The tree construction and traverse algorithms are portable to many-core devices which have support for CUDA or OpenCL programming languages. The gravitational tree-code outperforms tuned CPU code during the tree-construction and shows a performance improvement of more than a factor 20 overall, resulting in a processing rate of more than 2.8 million particles per second.Comment: Accepted version. Published in Journal of Computational Physics. 35 pages, 12 figures, single colum
    • 

    corecore