3,683 research outputs found
FASTCUDA: Open Source FPGA Accelerator & Hardware-Software Codesign Toolset for CUDA Kernels
Using FPGAs as hardware accelerators that communicate with a central CPU is becoming a common practice in the embedded design world but there is no standard methodology and toolset to facilitate this path yet. On the other hand, languages such as CUDA and OpenCL provide standard development environments for Graphical Processing Unit (GPU) programming. FASTCUDA is a platform that provides the necessary software toolset, hardware architecture, and design methodology to efficiently adapt the CUDA approach into a new FPGA design flow. With FASTCUDA, the CUDA kernels of a CUDA-based application are partitioned into two groups with minimal user intervention: those that are compiled and executed in parallel software, and those that are synthesized and implemented in hardware. A modern low power FPGA can provide the processing power (via numerous embedded micro-CPUs) and the logic capacity for both the software and hardware implementations of the CUDA kernels. This paper describes the system requirements and the architectural decisions behind the FASTCUDA approach
A GPU-based Evolution Strategy for Optic Disk Detection in Retinal Images
La ejecución paralela de aplicaciones usando unidades de procesamiento gráfico (gpu) ha ganado gran interés en la comunidad académica en los años recientes. La computación paralela puede ser aplicada a las estrategias evolutivas para procesar individuos dentro de una población, sin embargo, las estrategias evolutivas se caracterizan por un significativo consumo de recursos computacionales al resolver problemas de gran tamaño o aquellos que se modelan mediante funciones de aptitud complejas. Este artículo describe la implementación de una estrategia evolutiva para la detección del disco óptico en imágenes de retina usando Compute Unified Device Architecture (cuda). Los resultados experimentales muestran que el tiempo de ejecución para la detección del disco óptico logra una aceleración de 5 a 7 veces, comparado con la ejecución secuencial en una cpu convencional.Parallel processing using graphic processing units (GPUs) has attracted much research interest in recent years. Parallel computation can be applied to evolution strategy (ES) for processing individuals in a population, but evolutionary strategies are time consuming to solve large computational problems or complex fitness functions. In this paper we describe the implementation of an improved ES for optic disk detection in retinal images using the Compute Unified Device Architecture (CUDA) environment. In the experimental results we show that the computational time for optic disk detection task has a speedup factor of 5x and 7x compared to an implementation on a mainstream CPU
GPU accelerated Monte Carlo simulation of Brownian motors dynamics with CUDA
This work presents an updated and extended guide on methods of a proper
acceleration of the Monte Carlo integration of stochastic differential
equations with the commonly available NVIDIA Graphics Processing Units using
the CUDA programming environment. We outline the general aspects of the
scientific computing on graphics cards and demonstrate them with two models of
a well known phenomenon of the noise induced transport of Brownian motors in
periodic structures. As a source of fluctuations in the considered systems we
selected the three most commonly occurring noises: the Gaussian white noise,
the white Poissonian noise and the dichotomous process also known as a random
telegraph signal. The detailed discussion on various aspects of the applied
numerical schemes is also presented. The measured speedup can be of the
astonishing order of about 3000 when compared to a typical CPU. This number
significantly expands the range of problems solvable by use of stochastic
simulations, allowing even an interactive research in some cases.Comment: 21 pages, 5 figures; Comput. Phys. Commun., accepted, 201
- …