1,556 research outputs found

    A Parallel Rendering Algorithm for MIMD Architectures

    Get PDF
    Applications such as animation and scientific visualization demand high performance rendering of complex three dimensional scenes. To deliver the necessary rendering rates, highly parallel hardware architectures are required. The challenge is then to design algorithms and software which effectively use the hardware parallelism. A rendering algorithm targeted to distributed memory MIMD architectures is described. For maximum performance, the algorithm exploits both object-level and pixel-level parallelism. The behavior of the algorithm is examined both analytically and experimentally. Its performance for large numbers of processors is found to be limited primarily by communication overheads. An experimental implementation for the Intel iPSC/860 shows increasing performance from 1 to 128 processors across a wide range of scene complexities. It is shown that minimal modifications to the algorithm will adapt it for use on shared memory architectures as well

    PERFORMANCE EVALUATION OF MEMORY AND COMPUTATIONALLY BOUND CHEMISTRY APPLICATIONS ON STREAMING GPGPUS AND MULTI-CORE X86 CPUS

    Get PDF
    In recent years, multi-core processors have come to dominate the field in desktop and high performance computing. Graphics processors traditionally used in CAD, video games, and other 3-d applications, have become more programmable and are now suitable for general purpose computing. This thesis explores multi-core processors and GPU performance and limitations in two computational chemistry applications: a memory bound component of ab-initio modeling and a computationally bound Monte Carlo simulation. For the applications presented in this thesis, exploiting multiple processors is done using a variety of tools and languages including OpenMP and MKL. Brook+ and the Compute Abstraction Layer streaming environments are used to accelerate applications on AMD GPUs. This thesis gives qualitative assertions about these languages and tools regarding ease of use and optimization in addition to quantitative analyses of performance. GPUs can yield modest performance improvements with little effort in some applications and even larger speedups with simple optimizations

    GPGPU-Enabled Physics Based Deformed Model Simulation

    Get PDF
    Computer simulation techniques are widely adopted nowadays in many areas like manufacturing, engineering, graphics, animation, virtual reality and so on. However, the standard finite element based simulation is notorious for its expensive computation. To address this challenge, I present a GPU-based parallel implementation for simulating large elastic deformation. Classic modal analysis provides a set of orthonormal bases vectors, which span a spectral space encoding the dynamics of the elastic body. As each basis vector is orthogonal to each other, the computation is completely decoupled and can be well-fit into the modern GPGPU platform. We further explore the latest feature of NVIDIA CUDA so that the result of GPU computation can be directly used for upcoming rendering/visualization and a significant amount of overheads for transmitting data from client GPU and host CPU via the PCI-Express bus are avoided. Real-time simulation is made possible with this technique for many cases that otherwise is not possible

    PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation

    Full text link
    High-performance computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs). These devices offer tremendous potential for performance and efficiency in important large-scale applications of computational science. However, exploiting this potential can be challenging, as one must adapt to the specialized and rapidly evolving computing environment currently exhibited by GPUs. One way of addressing this challenge is to embrace better techniques and develop tools tailored to their needs. This article presents one simple technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL, two open-source toolkits that support this technique. In introducing PyCUDA and PyOpenCL, this article proposes the combination of a dynamic, high-level scripting language with the massive performance of a GPU as a compelling two-tiered computing platform, potentially offering significant performance and productivity advantages over conventional single-tier, static systems. The concept of RTCG is simple and easily implemented using existing, robust infrastructure. Nonetheless it is powerful enough to support (and encourage) the creation of custom application-specific tools by its users. The premise of the paper is illustrated by a wide range of examples where the technique has been applied with considerable success.Comment: Submitted to Parallel Computing, Elsevie

    An Optimized Architecture for CGA Operations and Its Application to a Simulated Robotic Arm

    Get PDF
    Conformal geometric algebra (CGA) is a new geometric computation tool that is attracting growing attention in many research fields, such as computer graphics, robotics, and computer vision. Regarding the robotic applications, new approaches based on CGA have been proposed to efficiently solve problems as the inverse kinematics and grasping of a robotic arm. The hardware acceleration of CGA operations is required to meet real-time performance requirements in embedded robotic platforms. In this paper, we present a novel embedded coprocessor for accelerating CGA operations in robotic tasks. Two robotic algorithms, namely, inverse kinematics and grasping of a human-arm-like kinematics chain, are used to prove the effectiveness of the proposed approach. The coprocessor natively supports the entire set of CGA operations including both basic operations (products, sums/differences, and unary operations) and complex operations as rigid body motion operations (reflections, rotations, translations, and dilations). The coprocessor prototype is implemented on the Xilinx ML510 development platform as a complete system-on-chip (SoC), integrating both a PowerPC processing core and a CGA coprocessing core on the same Xilinx Virtex-5 FPGA chip. Experimental results show speedups of 78x and 246x for inverse kinematics and grasping algorithms, respectively, with respect to the execution on the PowerPC processor

    Graphycs supercomputing applied to brain image analysis with niftyreg

    Get PDF
    Abstract: Medical image processing in general and brain image processing in particular are computationally intensive tasks. Luckily, their use can be liberalized by means of techniques such as GPU programming. In this article we study NiftyReg, a brain image processing library with a GPU implementation using CUDA, and analyse different possible ways of further optimising the existing codes. We will focus on fully using the memory hierarchy and on exploiting the computational power of the CPU. The ideas that lead us towards the different attempts to change and optimize the code will be shown as hypotheses, which we will then test empirically using the results obtained from running the application. Finally, for each set of related optimizations we will study the validity of the obtained results in terms of both performance and the accuracy of the resulting images

    Exploring boundaries in game processing

    Get PDF
    corecore