508 research outputs found

    Markov chain Monte Carlo on the GPU

    Get PDF
    Markov chains are a useful tool in statistics that allow us to sample and model a large population of individuals. We can extend this idea to the challenge of sampling solutions to problems. Using Markov chain Monte Carlo (MCMC) techniques we can also attempt to approximate the number of solutions with a certain confidence based on the number of samples we use to compute our estimate. Even though this approximation works very well for getting accurate results for very large problems, it is still computationally intensive. Many of the current algorithms use parallel implementations to improve their performance. Modern day graphics processing units (GPU\u27s) have been increasing in computational power very rapidly over the past few years. Due to their inherently parallel nature and increased flexibility for general purpose computation, they lend themselves very well to building a framework for general purpose Markov chain simulation and evaluation. In addition, the majority of mid- to high-range workstations have graphics cards capable of supporting modern day general purpose GPU (GPGPU) frameworks such as OpenCL, CUDA, or DirectCompute. This thesis presents work done to create a general purpose framework for Markov chain simulations and Markov chain Monte Carlo techniques on the GPU using the OpenCL toolkit. OpenCL is a GPGPU framework that is platform and hardware independent, which will further increase the accessibility of the software. Due to the increasing power, flexibility, and prevalence of GPUs, a wider range of developers and researchers will be able to take advantage of a high performing general purpose framework in their research. A number of experiments are also conducted to demonstrate the benefits and feasibility of using the power of the GPU to solve Markov chain Monte Carlo problems

    A GPU Powered Mobile AR Navigation System

    Get PDF
    This thesis presents a real-time Augmented Reality Navigation System(ARNavi) on Android smartphone that leverages the parallel computing power of mobile GPUs. Unlike conventional navigation systems, our proposed ARNavi augments navigation information onto real scene video streaming from device camera in real-time. To achieve this goal, we implement and accelerate compute intensive part of applications using OpenCL on GPU integrated on mobile Application Processor (AP). The contributions of this thesis are three-fold. First, we propose new lane detection algorithm and prediction mechanism based on geometric coordinates. The result shows that these two algorithms are fast and accurate. Second, we port and accelerate a complete application on mobile AP. By taking advantage of CPU-GPU heterogeneous computing techniques, we achieve more than 2.6 times performance boost compared to CPU only version. Lastly, we successfully integrate OpenCL and OpenCV on Android platform

    A Language and Hardware Independent Approach to Quantum-Classical Computing

    Full text link
    Heterogeneous high-performance computing (HPC) systems offer novel architectures which accelerate specific workloads through judicious use of specialized coprocessors. A promising architectural approach for future scientific computations is provided by heterogeneous HPC systems integrating quantum processing units (QPUs). To this end, we present XACC (eXtreme-scale ACCelerator) --- a programming model and software framework that enables quantum acceleration within standard or HPC software workflows. XACC follows a coprocessor machine model that is independent of the underlying quantum computing hardware, thereby enabling quantum programs to be defined and executed on a variety of QPUs types through a unified application programming interface. Moreover, XACC defines a polymorphic low-level intermediate representation, and an extensible compiler frontend that enables language independent quantum programming, thus promoting integration and interoperability across the quantum programming landscape. In this work we define the software architecture enabling our hardware and language independent approach, and demonstrate its usefulness across a range of quantum computing models through illustrative examples involving the compilation and execution of gate and annealing-based quantum programs

    Sigmoid: An auto-tuned load balancing algorithm for heterogeneous systems

    Get PDF
    A challenge that heterogeneous system programmers face is leveraging the performance of all the devices that integrate the system. This paper presents Sigmoid, a new load balancing algorithm that efficiently co-executes a single OpenCL data-parallel kernel on all the devices of heterogeneous systems. Sigmoid splits the workload proportionally to the capabilities of the devices, drastically reducing response time and energy consumption. It is designed around several features; it is dynamic, adaptive, guided and effortless, as it does not require the user to give any parameter, adapting to the behaviourof each kernel at runtime. To evaluate Sigmoid's performance, it has been implemented in Maat, a system abstraction library. Experimental results with different kernel types show that Sigmoid exhibits excellent performance, reaching a utilization of 90%, together with energy savings up to 20%, always reducing programming effort compared to OpenCL, and facilitating the portability to other heterogeneous machines.This work has been supported by the Spanish Science and Technology Commission under contract PID2019-105660RB-C22 and the European HiPEAC Network of Excellence

    HIGH PERFORMANCE COMPUTING FOR RECONNAISSANCE APPLICATIONS

    Get PDF
    Parallel programming is vital to fully utilize the multicore architectures that dominate the processor market. The market, however, is constantly evolving, with new processors and new architectures getting released annually. Using an open parallel processing language, such as OpenCL (Open Computing Language), enables the use of a single program across multiple architectures. It also enables a method of evaluation between multiple devices so the best choice can be made for a given application. In this research, OpenCL is used to evaluate the performance of two signal processing algorithms across two graphics processing units and one central processing unit. Experimental results show that for each algorithm, a specific device can clearly be shown to outperform the others.Ensign, United States NavyApproved for public release; distribution is unlimited

    ImageJ2: ImageJ for the next generation of scientific image data

    Full text link
    ImageJ is an image analysis program extensively used in the biological sciences and beyond. Due to its ease of use, recordable macro language, and extensible plug-in architecture, ImageJ enjoys contributions from non-programmers, amateur programmers, and professional developers alike. Enabling such a diversity of contributors has resulted in a large community that spans the biological and physical sciences. However, a rapidly growing user base, diverging plugin suites, and technical limitations have revealed a clear need for a concerted software engineering effort to support emerging imaging paradigms, to ensure the software's ability to handle the requirements of modern science. Due to these new and emerging challenges in scientific imaging, ImageJ is at a critical development crossroads. We present ImageJ2, a total redesign of ImageJ offering a host of new functionality. It separates concerns, fully decoupling the data model from the user interface. It emphasizes integration with external applications to maximize interoperability. Its robust new plugin framework allows everything from image formats, to scripting languages, to visualization to be extended by the community. The redesigned data model supports arbitrarily large, N-dimensional datasets, which are increasingly common in modern image acquisition. Despite the scope of these changes, backwards compatibility is maintained such that this new functionality can be seamlessly integrated with the classic ImageJ interface, allowing users and developers to migrate to these new methods at their own pace. ImageJ2 provides a framework engineered for flexibility, intended to support these requirements as well as accommodate future needs

    Parallel For Loops on Heterogeneous Resources

    Get PDF
    In recent years, Graphics Processing Units (GPUs) have piqued the interest of researchers in scientific computing. Their immense floating point throughput and massive parallelism make them ideal for not just graphical applications, but many general algorithms as well. Load balancing applications and taking advantage of all computational resources in a machine is a difficult challenge, especially when the resources are heterogeneous. This dissertation presents the clUtil library, which vastly simplifies developing OpenCL applications for heterogeneous systems. The core focus of this dissertation lies in clUtil\u27s ParallelFor construct and our novel PINA scheduler which can efficiently load balance work onto multiple GPUs and CPUs simultaneously

    Geometry based visualization with OpenCL

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaThis work targets the design and implementation of an isosurface extraction solution capable of handling large datasets. The Marching Cubes algorithm is the method used to extract the isosurfaces. These are graphical representations of points with a constant value (e.g. matter density) within volumetric datasets. A very useful approach to visualize particular regions of such data. One of the major goals of this work is to get a significant performance improvement, compared to the currently available CPU solutions. The OpenCL framework is used to accelerate the solution. This framework is an open standard for parallel programming of heterogeneous systems recently proposed. Unlike previous programming frameworks for GPUs such as CUDA, with OpenCL the workload can be distributed among CPUs, GPUs, DSPs, and other similar microprocessors
    corecore