15,779 research outputs found
Optimisation opportunities and evaluation for GPGPU applications on low-end mobile GPUs
Previous works in the literature have shown the feasibility of general purpose computations for non-visual applications on low-end mobile graphics processors using graphics APIs. These works focused only on the functional aspects of the software, ignoring the implementation details and therefore their performance implications due to their particular micro-architecture. Since various steps in such applications can be implemented in multiple ways, we identify optimisation opportunities, explore the different options and evaluate them. We show that the implementation details can significantly affect the obtained performance with discrepancies up to 3 orders of magnitude and we demonstrate the effectiveness of our proposal on two embedded platforms, obtaining more than 16× speedup over benchmarks designed following OpenGL ES 2 best practices.This work has been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2015-65316-P and the HiPEAC Network of Excellence.Peer ReviewedPostprint (author's final draft
A Study of Speed of the Boundary Element Method as applied to the Realtime Computational Simulation of Biological Organs
In this work, possibility of simulating biological organs in realtime using
the Boundary Element Method (BEM) is investigated. Biological organs are
assumed to follow linear elastostatic material behavior, and constant boundary
element is the element type used. First, a Graphics Processing Unit (GPU) is
used to speed up the BEM computations to achieve the realtime performance.
Next, instead of the GPU, a computer cluster is used. Results indicate that BEM
is fast enough to provide for realtime graphics if biological organs are
assumed to follow linear elastostatic material behavior. Although the present
work does not conduct any simulation using nonlinear material models, results
from using the linear elastostatic material model imply that it would be
difficult to obtain realtime performance if highly nonlinear material models
that properly characterize biological organs are used. Although the use of BEM
for the simulation of biological organs is not new, the results presented in
the present study are not found elsewhere in the literature.Comment: preprint, draft, 2 tables, 47 references, 7 files, Codes that can
solve three dimensional linear elastostatic problems using constant boundary
elements (of triangular shape) while ignoring body forces are provided as
supplementary files; codes are distributed under the MIT License in three
versions: i) MATLAB version ii) Fortran 90 version (sequential code) iii)
Fortran 90 version (parallel code
A pilgrimage to gravity on GPUs
In this short review we present the developments over the last 5 decades that
have led to the use of Graphics Processing Units (GPUs) for astrophysical
simulations. Since the introduction of NVIDIA's Compute Unified Device
Architecture (CUDA) in 2007 the GPU has become a valuable tool for N-body
simulations and is so popular these days that almost all papers about high
precision N-body simulations use methods that are accelerated by GPUs. With the
GPU hardware becoming more advanced and being used for more advanced algorithms
like gravitational tree-codes we see a bright future for GPU like hardware in
computational astrophysics.Comment: To appear in: European Physical Journal "Special Topics" : "Computer
Simulations on Graphics Processing Units" . 18 pages, 8 figure
Achieving High Speed CFD simulations: Optimization, Parallelization, and FPGA Acceleration for the unstructured DLR TAU Code
Today, large scale parallel simulations are fundamental tools to handle complex problems. The number of processors in current computation platforms has been recently increased and therefore it is necessary to optimize the application performance and to enhance the scalability of massively-parallel systems. In addition, new heterogeneous architectures, combining conventional processors with specific hardware, like FPGAs, to accelerate the most time consuming functions are considered as a strong alternative to boost the performance.
In this paper, the performance of the DLR TAU code is analyzed and optimized. The improvement of the code efficiency is addressed through three key activities: Optimization, parallelization and hardware acceleration. At first, a profiling analysis of the most time-consuming processes of the Reynolds Averaged Navier Stokes flow solver on a three-dimensional unstructured mesh is performed. Then, a study of the code scalability with new partitioning algorithms are tested to show the most suitable partitioning algorithms for the selected applications. Finally, a feasibility study on the application of FPGAs and GPUs for the hardware acceleration of CFD simulations is presented
Massively Parallel Computing and the Search for Jets and Black Holes at the LHC
Massively parallel computing at the LHC could be the next leap necessary to
reach an era of new discoveries at the LHC after the Higgs discovery.
Scientific computing is a critical component of the LHC experiment, including
operation, trigger, LHC computing GRID, simulation, and analysis. One way to
improve the physics reach of the LHC is to take advantage of the flexibility of
the trigger system by integrating coprocessors based on Graphics Processing
Units (GPUs) or the Many Integrated Core (MIC) architecture into its server
farm. This cutting edge technology provides not only the means to accelerate
existing algorithms, but also the opportunity to develop new algorithms that
select events in the trigger that previously would have evaded detection. In
this article we describe new algorithms that would allow to select in the
trigger new topological signatures that include non-prompt jet and black
hole--like objects in the silicon tracker.Comment: 15 pages, 11 figures, submitted to NIM
- …