9,858 research outputs found

    A Modeling Approach based on UML/MARTE for GPU Architecture

    Get PDF
    Nowadays, the High Performance Computing is part of the context of embedded systems. Graphics Processing Units (GPUs) are more and more used in acceleration of the most part of algorithms and applications. Over the past years, not many efforts have been done to describe abstractions of applications in relation to their target architectures. Thus, when developers need to associate applications and GPUs, for example, they find difficulty and prefer using API for these architectures. This paper presents a metamodel extension for MARTE profile and a model for GPU architectures. The main goal is to specify the task and data allocation in the memory hierarchy of these architectures. The results show that this approach will help to generate code for GPUs based on model transformations using Model Driven Engineering (MDE).Comment: Symposium en Architectures nouvelles de machines (SympA'14) (2011

    Benchmarking CPUs and GPUs on embedded platforms for software receiver usage

    Get PDF
    Smartphones containing multi-core central processing units (CPUs) and powerful many-core graphics processing units (GPUs) bring supercomputing technology into your pocket (or into our embedded devices). This can be exploited to produce power-efficient, customized receivers with flexible correlation schemes and more advanced positioning techniques. For example, promising techniques such as the Direct Position Estimation paradigm or usage of tracking solutions based on particle filtering, seem to be very appealing in challenging environments but are likewise computationally quite demanding. This article sheds some light onto recent embedded processor developments, benchmarks Fast Fourier Transform (FFT) and correlation algorithms on representative embedded platforms and relates the results to the use in GNSS software radios. The use of embedded CPUs for signal tracking seems to be straight forward, but more research is required to fully achieve the nominal peak performance of an embedded GPU for FFT computation. Also the electrical power consumption is measured in certain load levels.Peer ReviewedPostprint (published version

    Brook Auto: High-Level Certification-Friendly Programming for GPU-powered Automotive Systems

    Get PDF
    Modern automotive systems require increased performance to implement Advanced Driving Assistance Systems (ADAS). GPU-powered platforms are promising candidates for such computational tasks, however current low-level programming models challenge the accelerator software certification process, while they limit the hardware selection to a fraction of the available platforms. In this paper we present Brook Auto, a high-level programming language for automotive GPU systems which removes these limitations. We describe the challenges and solutions we faced in its implementation, as well as a complete evaluation in terms of performance and productivity, which shows the effectiveness of our method.This work has been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2015-65316-P and the HiPEAC Network of Excellence.Peer ReviewedPostprint (author's final draft

    PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation

    Full text link
    High-performance computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs). These devices offer tremendous potential for performance and efficiency in important large-scale applications of computational science. However, exploiting this potential can be challenging, as one must adapt to the specialized and rapidly evolving computing environment currently exhibited by GPUs. One way of addressing this challenge is to embrace better techniques and develop tools tailored to their needs. This article presents one simple technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL, two open-source toolkits that support this technique. In introducing PyCUDA and PyOpenCL, this article proposes the combination of a dynamic, high-level scripting language with the massive performance of a GPU as a compelling two-tiered computing platform, potentially offering significant performance and productivity advantages over conventional single-tier, static systems. The concept of RTCG is simple and easily implemented using existing, robust infrastructure. Nonetheless it is powerful enough to support (and encourage) the creation of custom application-specific tools by its users. The premise of the paper is illustrated by a wide range of examples where the technique has been applied with considerable success.Comment: Submitted to Parallel Computing, Elsevie

    Efficient transfer entropy analysis of non-stationary neural time series

    Full text link
    Information theory allows us to investigate information processing in neural systems in terms of information transfer, storage and modification. Especially the measure of information transfer, transfer entropy, has seen a dramatic surge of interest in neuroscience. Estimating transfer entropy from two processes requires the observation of multiple realizations of these processes to estimate associated probability density functions. To obtain these observations, available estimators assume stationarity of processes to allow pooling of observations over time. This assumption however, is a major obstacle to the application of these estimators in neuroscience as observed processes are often non-stationary. As a solution, Gomez-Herrero and colleagues theoretically showed that the stationarity assumption may be avoided by estimating transfer entropy from an ensemble of realizations. Such an ensemble is often readily available in neuroscience experiments in the form of experimental trials. Thus, in this work we combine the ensemble method with a recently proposed transfer entropy estimator to make transfer entropy estimation applicable to non-stationary time series. We present an efficient implementation of the approach that deals with the increased computational demand of the ensemble method's practical application. In particular, we use a massively parallel implementation for a graphics processing unit to handle the computationally most heavy aspects of the ensemble method. We test the performance and robustness of our implementation on data from simulated stochastic processes and demonstrate the method's applicability to magnetoencephalographic data. While we mainly evaluate the proposed method for neuroscientific data, we expect it to be applicable in a variety of fields that are concerned with the analysis of information transfer in complex biological, social, and artificial systems.Comment: 27 pages, 7 figures, submitted to PLOS ON
    • …
    corecore