Search CORE

508 research outputs found

Markov chain Monte Carlo on the GPU

Author: Dumont Michael
Publication venue: RIT Scholar Works
Publication date: 01/01/2011
Field of study

Markov chains are a useful tool in statistics that allow us to sample and model a large population of individuals. We can extend this idea to the challenge of sampling solutions to problems. Using Markov chain Monte Carlo (MCMC) techniques we can also attempt to approximate the number of solutions with a certain confidence based on the number of samples we use to compute our estimate. Even though this approximation works very well for getting accurate results for very large problems, it is still computationally intensive. Many of the current algorithms use parallel implementations to improve their performance. Modern day graphics processing units (GPU\u27s) have been increasing in computational power very rapidly over the past few years. Due to their inherently parallel nature and increased flexibility for general purpose computation, they lend themselves very well to building a framework for general purpose Markov chain simulation and evaluation. In addition, the majority of mid- to high-range workstations have graphics cards capable of supporting modern day general purpose GPU (GPGPU) frameworks such as OpenCL, CUDA, or DirectCompute. This thesis presents work done to create a general purpose framework for Markov chain simulations and Markov chain Monte Carlo techniques on the GPU using the OpenCL toolkit. OpenCL is a GPGPU framework that is platform and hardware independent, which will further increase the accessibility of the software. Due to the increasing power, flexibility, and prevalence of GPUs, a wider range of developers and researchers will be able to take advantage of a high performing general purpose framework in their research. A number of experiments are also conducted to demonstrate the benefits and feasibility of using the power of the GPU to solve Markov chain Monte Carlo problems

RIT Scholar Works

A GPU Powered Mobile AR Navigation System

Author: Zhao Mengshen
Publication venue: eGrove
Publication date: 01/01/2015
Field of study

This thesis presents a real-time Augmented Reality Navigation System(ARNavi) on Android smartphone that leverages the parallel computing power of mobile GPUs. Unlike conventional navigation systems, our proposed ARNavi augments navigation information onto real scene video streaming from device camera in real-time. To achieve this goal, we implement and accelerate compute intensive part of applications using OpenCL on GPU integrated on mobile Application Processor (AP). The contributions of this thesis are three-fold. First, we propose new lane detection algorithm and prediction mechanism based on geometric coordinates. The result shows that these two algorithms are fast and accurate. Second, we port and accelerate a complete application on mobile AP. By taking advantage of CPU-GPU heterogeneous computing techniques, we achieve more than 2.6 times performance boost compared to CPU only version. Lastly, we successfully integrate OpenCL and OpenCV on Android platform

eGrove (Univ. of Mississippi)

A Language and Hardware Independent Approach to Quantum-Classical Computing

Author: Chen Mengsu
Dumitrescu Eugene F.
Feng Wu-chun
Humble Travis S.
Liakh Dmitry
McCaskey Alexander J.
Publication venue
Publication date: 01/01/2018
Field of study

Heterogeneous high-performance computing (HPC) systems offer novel architectures which accelerate specific workloads through judicious use of specialized coprocessors. A promising architectural approach for future scientific computations is provided by heterogeneous HPC systems integrating quantum processing units (QPUs). To this end, we present XACC (eXtreme-scale ACCelerator) --- a programming model and software framework that enables quantum acceleration within standard or HPC software workflows. XACC follows a coprocessor machine model that is independent of the underlying quantum computing hardware, thereby enabling quantum programs to be defined and executed on a variety of QPUs types through a unified application programming interface. Moreover, XACC defines a polymorphic low-level intermediate representation, and an extensible compiler frontend that enables language independent quantum programming, thus promoting integration and interoperability across the quantum programming landscape. In this work we define the software architecture enabling our hardware and language independent approach, and demonstrate its usefulness across a range of quantum computing models through illustrative examples involving the compilation and execution of gate and annealing-based quantum programs

arXiv.org e-Print Archive

Directory of Open Access Journals

Sigmoid: An auto-tuned load balancing algorithm for heterogeneous systems

Author: Beivide Palacio Ramón
Bosque Orero José Luis
Pérez Pavón Borja
Stafford Fernández Esteban
Publication venue: 'Elsevier BV'
Publication date: 01/11/2021
Field of study

A challenge that heterogeneous system programmers face is leveraging the performance of all the devices that integrate the system. This paper presents Sigmoid, a new load balancing algorithm that efficiently co-executes a single OpenCL data-parallel kernel on all the devices of heterogeneous systems. Sigmoid splits the workload proportionally to the capabilities of the devices, drastically reducing response time and energy consumption. It is designed around several features; it is dynamic, adaptive, guided and effortless, as it does not require the user to give any parameter, adapting to the behaviourof each kernel at runtime. To evaluate Sigmoid's performance, it has been implemented in Maat, a system abstraction library. Experimental results with different kernel types show that Sigmoid exhibits excellent performance, reaching a utilization of 90%, together with energy savings up to 20%, always reducing programming effort compared to OpenCL, and facilitating the portability to other heterogeneous machines.This work has been supported by the Spanish Science and Technology Commission under contract PID2019-105660RB-C22 and the European HiPEAC Network of Excellence

UCrea

HIGH PERFORMANCE COMPUTING FOR RECONNAISSANCE APPLICATIONS

Author: Stevens Christopher J.
Publication venue: Monterey, California. Naval Postgraduate School
Publication date: 01/06/2014
Field of study

Parallel programming is vital to fully utilize the multicore architectures that dominate the processor market. The market, however, is constantly evolving, with new processors and new architectures getting released annually. Using an open parallel processing language, such as OpenCL (Open Computing Language), enables the use of a single program across multiple architectures. It also enables a method of evaluation between multiple devices so the best choice can be made for a given application. In this research, OpenCL is used to evaluate the performance of two signal processing algorithms across two graphics processing units and one central processing unit. Experimental results show that for each algorithm, a specific device can clearly be shown to outperform the others.Ensign, United States NavyApproved for public release; distribution is unlimited

Calhoun, Institutional Archive of the Naval Postgraduate School

ImageJ2: ImageJ for the next generation of scientific image data

Author: Arena Ellen T.
DeZonia Barry E.
Eliceiri Kevin W.
Hiner Mark C.
Rueden Curtis T.
Schindelin Johannes
Walter Alison E.
Publication venue
Publication date: 01/11/2017
Field of study

ImageJ is an image analysis program extensively used in the biological sciences and beyond. Due to its ease of use, recordable macro language, and extensible plug-in architecture, ImageJ enjoys contributions from non-programmers, amateur programmers, and professional developers alike. Enabling such a diversity of contributors has resulted in a large community that spans the biological and physical sciences. However, a rapidly growing user base, diverging plugin suites, and technical limitations have revealed a clear need for a concerted software engineering effort to support emerging imaging paradigms, to ensure the software's ability to handle the requirements of modern science. Due to these new and emerging challenges in scientific imaging, ImageJ is at a critical development crossroads. We present ImageJ2, a total redesign of ImageJ offering a host of new functionality. It separates concerns, fully decoupling the data model from the user interface. It emphasizes integration with external applications to maximize interoperability. Its robust new plugin framework allows everything from image formats, to scripting languages, to visualization to be extended by the community. The redesigned data model supports arbitrarily large, N-dimensional datasets, which are increasingly common in modern image acquisition. Despite the scope of these changes, backwards compatibility is maintained such that this new functionality can be seamlessly integrated with the classic ImageJ interface, allowing users and developers to migrate to these new methods at their own pace. ImageJ2 provides a framework engineered for flexibility, intended to support these requirements as well as accommodate future needs

arXiv.org e-Print Archive

Directory of Open Access Journals

Parallel For Loops on Heterogeneous Resources

Author: Weber Frederick Edward
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/12/2012
Field of study

In recent years, Graphics Processing Units (GPUs) have piqued the interest of researchers in scientific computing. Their immense floating point throughput and massive parallelism make them ideal for not just graphical applications, but many general algorithms as well. Load balancing applications and taking advantage of all computational resources in a machine is a difficult challenge, especially when the resources are heterogeneous. This dissertation presents the clUtil library, which vastly simplifies developing OpenCL applications for heterogeneous systems. The core focus of this dissertation lies in clUtil\u27s ParallelFor construct and our novel PINA scheduler which can efficiently load balance work onto multiple GPUs and CPUs simultaneously

University of Tennessee, Knoxville: Trace

Geometry based visualization with OpenCL

Author: Rogeiro João Pedro Martins
Publication venue: Faculdade de Ciências e Tecnologia
Publication date: 01/01/2011
Field of study

Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaThis work targets the design and implementation of an isosurface extraction solution capable of handling large datasets. The Marching Cubes algorithm is the method used to extract the isosurfaces. These are graphical representations of points with a constant value (e.g. matter density) within volumetric datasets. A very useful approach to visualize particular regions of such data. One of the major goals of this work is to get a significant performance improvement, compared to the currently available CPU solutions. The OpenCL framework is used to accelerate the solution. This framework is an open standard for parallel programming of heterogeneous systems recently proposed. Unlike previous programming frameworks for GPUs such as CUDA, with OpenCL the workload can be distributed among CPUs, GPUs, DSPs, and other similar microprocessors

Repositório da Universidade Nova de Lisboa