Search CORE

10,710 research outputs found

Performance Analysis of a Novel GPU Computation-to-core Mapping Scheme for Robust Facet Image Modeling

Author: Cao Yong
Park Seung In
Quek Francis
Watson Layne T.
Publication venue
Publication date: 01/01/2012
Field of study

Though the GPGPU concept is well-known in image processing, much more work remains to be done to fully exploit GPUs as an alternative computation engine. This paper investigates the computation-to-core mapping strategies to probe the efficiency and scalability of the robust facet image modeling algorithm on GPUs. Our fine-grained computation-to-core mapping scheme shows a significant performance gain over the standard pixel-wise mapping scheme. With in-depth performance comparisons across the two different mapping schemes, we analyze the impact of the level of parallelism on the GPU computation and suggest two principles for optimizing future image processing applications on the GPU platform

Computer Science Technical Reports @Virginia Tech

A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems

Author: Mittal Sparsh
Publication venue
Publication date: 01/01/2014
Field of study

Recent technological advances have greatly improved the performance and features of embedded systems. With the number of just mobile devices now reaching nearly equal to the population of earth, embedded systems have truly become ubiquitous. These trends, however, have also made the task of managing their power consumption extremely challenging. In recent years, several techniques have been proposed to address this issue. In this paper, we survey the techniques for managing power consumption of embedded systems. We discuss the need of power management and provide a classification of the techniques on several important parameters to highlight their similarities and differences. This paper is intended to help the researchers and application-developers in gaining insights into the working of power management techniques and designing even more efficient high-performance embedded systems of tomorrow

arXiv.org e-Print Archive

Crossref

Development and evaluation of a fault-tolerant multiprocessor (FTMP) computer. Volume 4: FTMP executive summary

Author: Lala J. H.
Smith T. B., III
Publication venue
Publication date
Field of study

The FTMP architecture is a high reliability computer concept modeled after a homogeneous multiprocessor architecture. Elements of the FTMP are operated in tight synchronism with one another and hardware fault-detection and fault-masking is provided which is transparent to the software. Operating system design and user software design is thus greatly simplified. Performance of the FTMP is also comparable to that of a simplex equivalent due to the efficiency of fault handling hardware. The FTMP project constructed an engineering module of the FTMP, programmed the machine and extensively tested the architecture through fault injection and other stress testing. This testing confirmed the soundness of the FTMP concepts

NASA Technical Reports Server

GPU accelerated Monte Carlo simulation of Brownian motors dynamics with CUDA

Author: Gardiner
Reimann
Hänggi
Jülicher
Kay
Hänggi
Binder
Kloeden
Platen
Januszewski
Seibert
Barros
Polyakov
Januszewski
Hänggi
Astumian
Hänggi
Risken
Łuczka
Hänggi
Spiechowicz
Spiechowicz
Spiechowicz
Czernik
Łuczka
Kula
Kula
Kostur
Łuczka
Kula
Kim
Grigoriu
Palleschi
Kim
Publication venue: 'Elsevier BV'
Publication date: 01/01/2004
Field of study

This work presents an updated and extended guide on methods of a proper acceleration of the Monte Carlo integration of stochastic differential equations with the commonly available NVIDIA Graphics Processing Units using the CUDA programming environment. We outline the general aspects of the scientific computing on graphics cards and demonstrate them with two models of a well known phenomenon of the noise induced transport of Brownian motors in periodic structures. As a source of fluctuations in the considered systems we selected the three most commonly occurring noises: the Gaussian white noise, the white Poissonian noise and the dichotomous process also known as a random telegraph signal. The detailed discussion on various aspects of the applied numerical schemes is also presented. The measured speedup can be of the astonishing order of about 3000 when compared to a typical CPU. This number significantly expands the range of problems solvable by use of stochastic simulations, allowing even an interactive research in some cases.Comment: 21 pages, 5 figures; Comput. Phys. Commun., accepted, 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Birmingham Research Portal

Solving the Ghost-Gluon System of Yang-Mills Theory on GPUs

Author: Aguilar
Alkofer
Alkofer
Alkofer
Atkinson
Boucaud
Cucchieri
Dyson
Fischer
Fischer
Fischer
Fischer
Fister
Glimm
Gribov
Gundolf Haase
Haag
Huber
Kugo
Lerche
Maas
Maas
Maas
Maas
Mandelstam
Maris
Markus Hopfer
Nakanishi
NVIDIA Corporation
NVIDIA Corporation
Osterwalder
Pawlowski
Reinhard Alkofer
Schwinger
Schwinger
Sternbeck
Sternbeck
Takahasi
Taylor
von Smekal
von Smekal
von Smekal
Watson
Zwanziger
Zwanziger
Publication venue: 'Elsevier BV'
Publication date: 18/12/2012
Field of study

We solve the ghost-gluon system of Yang-Mills theory using Graphics Processing Units (GPUs). Working in Landau gauge, we use the Dyson-Schwinger formalism for the mathematical description as this approach is well-suited to directly benefit from the computing power of the GPUs. With the help of a Chebyshev expansion for the dressing functions and a subsequent appliance of a Newton-Raphson method, the non-linear system of coupled integral equations is linearized. The resulting Newton matrix is generated in parallel using OpenMPI and CUDA(TM). Our results show, that it is possible to cut down the run time by two orders of magnitude as compared to a sequential version of the code. This makes the proposed techniques well-suited for Dyson-Schwinger calculations on more complicated systems where the Yang-Mills sector of QCD serves as a starting point. In addition, the computation of Schwinger functions using GPU devices is studied.Comment: 19 pages, 7 figures, additional figure added, dependence on block-size is investigated in more detail, version accepted by CP

arXiv.org e-Print Archive

Crossref