Search CORE

7 research outputs found

CUDA DSP Filter for ECG Signals

Author: Domazet Ervin
Gusev Marjan
Ristov Sasko
Publication venue
Publication date: 01/06/2016
Field of study

Real time processing is very important and critical for analysis of ECG signals. Prior to each processing, the signal needs to be filtered to enable feature extraction and further analysis. In case of building a data processing center that analyzes thousands of connected ECG sensors, one expects that the signal processing needs to be done very fast. In this paper, we focus on parallelizing the sequential DSP filter for processing of heart signals on GPU cores. We set a hypothesis that a GPU version is much faster than the CPU version. In this paper we have provided several experiments to test the validity of this hypothesis and to compare the performance of the parallelized GPU code with the sequential code. Assuming that the hypothesis is valid, we would also like to find what is the optimal size of the threads per block to obtain the maximum speedup. Our analysis shows that parallelized GPU code achieves linear speedups and is much more efficient than the classical single processor sequential processing

Crossref

FIKT Repository

Use of CUDA for the Continuous Space Language Model

Author: Anderson Timothy
Thompson Elizabeth A.
Publication venue: Opus: Research \u26 Creativity at IPFW
Publication date: 10/09/2012
Field of study

The training phase of the Continuous Space Language Model (CSLM) was implemented in the NVIDIA hardware/software architecture Compute Unified Device Architecture (CUDA). Implementation was accomplished using a combination of CUBLAS library routines and CUDA kernel calls on three different CUDA enabled devices of varying compute capability and a time savings over the traditional CPU approach demonstrated

Crossref

Opus: Research and Creativity at IPFW

Accelerating Stencil Computation on GPGPU by Novel Mapping Method Between the Global Memory and the Shared Memory

Author: Li Renfa
Mo Tieqiang
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 26/07/2018
Field of study

Acceleration of stencil computation can be effectively improved by utilizing the memory resource. In this paper, in order to reduce the branch divergence of traditional mapping method between the global memory and the shared memory, we devise a new mapping mechanism in which the conditional statements loading the boundary stencil computation points in every XY-tile are removed by aligning ghost zone to reduce the synchronization overhead. In addition, we make full use of single XY-tile loaded into registers in every stencil computation point, common sub-expression elimination and software prefetching to reduce overhead. At last detailed performance evaluation demonstrates our optimized policies are close to optimal in terms of memory bandwidth utilization and achieve higher performance of stencil computation

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Analyzing Communication Models for Distributed Thread-Collaborative Processors in Terms of Energy and Time

Author: Benjamin Klenk
Holger Fröning
Lena Oden
Publication venue
Publication date: 11/04/2020
Field of study

Abstract-Accelerated computing has become pervasive for increasing the computational power and energy efficiency in terms of GFLOPs/Watt. For application areas with highest demands, for instance high performance computing, data warehousing and high performance analytics, accelerators like GPUs or Intel's MICs are distributed throughout the cluster. Since current analyses and predictions show that data movement will be the main contributor to energy consumption, we are entering an era of communication-centric heterogeneous systems that are operating with hard power constraints. In this work, we analyze data movement optimizations for distributed heterogeneous systems based on CPUs and GPUs. Thread-collaborative processors like GPUs differ significantly in their execution model from generalpurpose processors like CPUs, but available communication models are still designed and optimized for CPUs. Similar to heterogeneity in processing, heterogeneity in communication can have a huge impact on energy and time. To analyze this impact, we use multiple workloads with distinct properties regarding computational intensity and communication characteristics. We show for which workloads tailored communication models are essential, not only reducing execution time but also saving energy. Exposing the impact in terms of energy and time for communication-centric heterogeneous systems is crucial for future optimizations, and this work is a first step in this direction

CiteSeerX

PARTANS:An autotuning framework for stencil computation on multi-GPU systems

Author: Christian Fensch
Fox G. C.
Itu L.
Kamil S.
Murray Cole
Phillips E. H.
Ripeanu M.
Thibaut Lutz
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Heriot Watt Pure

Crossref

Edinburgh Research Explorer