Search CORE

1,219 research outputs found

Platform Dependent Verification: On Engineering Verification Tools for 21st Century

Author: A. Aggarwal
A. B. Kahn
Alfons Laarman
Armin Biere
B. R. Haverkort
Boudewijn R. Haverkort
Brad Bingham
Cornelia P. Inggs
D. Bosnacki
David L. Dill
Doron Peled
E. Allen Emerson
E. M. Clarke
E.M. Clarke
Flavio Lerda
Flavio Lerda
G. Behrmann
G. Ciardo
G. Jayachandran
Gerard J. Holzmann
Gerard J. Holzmann
Gerard J. Holzmann
Gianfranco Ciardo
Giuseppe Della Penna
H. Garavel
I. Černá
I. Černá
J. Barnat
J. Barnat
J. Barnat
J. Barnat
J. Barnat
J. Barnat
J. Barnat
J. Barnat
J. Barnat
J. Barnat
J. Barnat
J. Barnat
J. Barnat
J. Barnat
J. Barnat
J. Barnat
J. Barnat
J. Barnat
J. R. Burch
Jaco Geldenhuys
Jiří Barnat
Jiří Barnat
K. Verstoep
Keijo Heljanko
Keijo Heljanko
L. Brim
L. Brim
Luboš Brim
M.Y. Vardi
Michael Jones
Moritz Hammer
Naga K. Govindaraju
P. Harish
Peter Lamborn
R. Korf
R. Korf
R. Pel\IeC ánek
Rahul Kumar
Rong Zhou
S. Allmaier
S. Caselli
Sami Evangelista
Shahid Jabbar
Shahid Jabbar
Stefan Edelkamp
T. von Eicken
Tonglaga Bao
U. Stern
U. Stern
W. Knottenbelt
W. Knottenbelt
Yi-Jen Chiang
Publication venue: 'Open Publishing Association'
Publication date: 01/10/2011
Field of study

The paper overviews recent developments in platform-dependent explicit-state LTL model checking.Comment: In Proceedings PDMC 2011, arXiv:1111.006

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Improved Software Implementation of DES Using CUDA and OpenCL

Author: Engsig-Karup Allan Peter
Noer Dennis
Zenner Erik
Publication venue
Publication date: 01/01/2011
Field of study

Online Research Database In Technology

A GPU based real-time software correlation system for the Murchison Widefield Array prototype

Author: Briggs Frank H.
Greenhill Lincoln J.
Wayth Randall B.
Publication venue: 'University of Chicago Press'
Publication date: 01/01/2009
Field of study

Modern graphics processing units (GPUs) are inexpensive commodity hardware that offer Tflop/s theoretical computing capacity. GPUs are well suited to many compute-intensive tasks including digital signal processing. We describe the implementation and performance of a GPU-based digital correlator for radio astronomy. The correlator is implemented using the NVIDIA CUDA development environment. We evaluate three design options on two generations of NVIDIA hardware. The different designs utilize the internal registers, shared memory and multiprocessors in different ways. We find that optimal performance is achieved with the design that minimizes global memory reads on recent generations of hardware. The GPU-based correlator outperforms a single-threaded CPU equivalent by a factor of 60 for a 32 antenna array, and runs on commodity PC hardware. The extra compute capability provided by the GPU maximises the correlation capability of a PC while retaining the fast development time associated with using standard hardware, networking and programming languages. In this way, a GPU-based correlation system represents a middle ground in design space between high performance, custom built hardware and pure CPU-based software correlation. The correlator was deployed at the Murchison Widefield Array 32 antenna prototype system where it ran in real-time for extended periods. We briefly describe the data capture, streaming and correlation system for the prototype array.Comment: 11 pages, to appear in PAS

arXiv.org e-Print Archive

Crossref

The Australian National University

espace@Curtin

GraphR: Accelerating Graph Processing Using ReRAM

Author: Chen Yiran
Li Hai
Qian Xuehai
Song Linghao
Zhuo Youwei
Publication venue
Publication date: 08/12/2017
Field of study

This paper presents GRAPHR, the first ReRAM-based graph processing accelerator. GRAPHR follows the principle of near-data processing and explores the opportunity of performing massive parallel analog operations with low hardware and energy cost. The analog computation is suit- able for graph processing because: 1) The algorithms are iterative and could inherently tolerate the imprecision; 2) Both probability calculation (e.g., PageRank and Collaborative Filtering) and typical graph algorithms involving integers (e.g., BFS/SSSP) are resilient to errors. The key insight of GRAPHR is that if a vertex program of a graph algorithm can be expressed in sparse matrix vector multiplication (SpMV), it can be efficiently performed by ReRAM crossbar. We show that this assumption is generally true for a large set of graph algorithms. GRAPHR is a novel accelerator architecture consisting of two components: memory ReRAM and graph engine (GE). The core graph computations are performed in sparse matrix format in GEs (ReRAM crossbars). The vector/matrix-based graph computation is not new, but ReRAM offers the unique opportunity to realize the massive parallelism with unprecedented energy efficiency and low hardware cost. With small subgraphs processed by GEs, the gain of performing parallel operations overshadows the wastes due to sparsity. The experiment results show that GRAPHR achieves a 16.01x (up to 132.67x) speedup and a 33.82x energy saving on geometric mean compared to a CPU baseline system. Com- pared to GPU, GRAPHR achieves 1.69x to 2.19x speedup and consumes 4.77x to 8.91x less energy. GRAPHR gains a speedup of 1.16x to 4.12x, and is 3.67x to 10.96x more energy efficiency compared to PIM-based architecture.Comment: Accepted to HPCA 201

arXiv.org e-Print Archive

Crossref

Efficient Neural Network Implementations on Parallel Embedded Platforms Applied to Real-Time Torque-Vectoring Optimization Using Predictions for Multi-Motor Electric Vehicles

Author: Cosco Francesco
Dendaluce Jahnke Martin
Gomez-Garay Vicente
Novickis Rihards
Pérez Rastelli Joshué
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

The combination of machine learning and heterogeneous embedded platforms enables new potential for developing sophisticated control concepts which are applicable to the field of vehicle dynamics and ADAS. This interdisciplinary work provides enabler solutions -ultimately implementing fast predictions using neural networks (NNs) on field programmable gate arrays (FPGAs) and graphical processing units (GPUs)- while applying them to a challenging application: Torque Vectoring on a multi-electric-motor vehicle for enhanced vehicle dynamics. The foundation motivating this work is provided by discussing multiple domains of the technological context as well as the constraints related to the automotive field, which contrast with the attractiveness of exploiting the capabilities of new embedded platforms to apply advanced control algorithms for complex control problems. In this particular case we target enhanced vehicle dynamics on a multi-motor electric vehicle benefiting from the greater degrees of freedom and controllability offered by such powertrains. Considering the constraints of the application and the implications of the selected multivariable optimization challenge, we propose a NN to provide batch predictions for real-time optimization. This leads to the major contribution of this work: efficient NN implementations on two intrinsically parallel embedded platforms, a GPU and a FPGA, following an analysis of theoretical and practical implications of their different operating paradigms, in order to efficiently harness their computing potential while gaining insight into their peculiarities. The achieved results exceed the expectations and additionally provide a representative illustration of the strengths and weaknesses of each kind of platform. Consequently, having shown the applicability of the proposed solutions, this work contributes valuable enablers also for further developments following similar fundamental principles.Some of the results presented in this work are related to activities within the 3Ccar project, which has received funding from ECSEL Joint Undertaking under grant agreement No. 662192. This Joint Undertaking received support from the European Union’s Horizon 2020 research and innovation programme and Germany, Austria, Czech Republic, Romania, Belgium, United Kingdom, France, Netherlands, Latvia, Finland, Spain, Italy, Lithuania. This work was also partly supported by the project ENABLES3, which received funding from ECSEL Joint Undertaking under grant agreement No. 692455-2

Multidisciplinary Digital Publishing Institute

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

TECNALIA Publications

GPU Integration into a Software Defined Radio Framework

Author: Millage Joel Gregory
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2010
Field of study

Software Defined Radio (SDR) was brought about by moving processing done on specific hardware components to reconfigurable software. Hardware components like General Purpose Processors (GPPs), Digital Signal Processors (DSPs) and Field Programmable Gate Arrays (FPGAs) are used to make the software and hardware processing of the radio more portable and as efficient as possible. Graphics Processing Units (GPUs) designed years ago for video rendering, are now finding new uses in research. The parallel architecture provided by the GPU gives developers the ability to speed up the performance of computationally intense programs. An open source tool for SDR, Open Source Software Communications Architecture (SCA) Implementation: Embedded (OSSIE), is a free waveform development environment for any developer who wants to experiment with SDR. In this work, OSSIE is integrated with a GPU computing framework to show how performance improvement can be gained from GPU parallelization. GPU research performed with SDR encompasses improving SDR simulations to implementing specific wireless protocols. In this thesis, we are aiming to show performance improvement within an SCA architected SDR implementation. The software components within OSSIE gained significant performance increases with little software changes due to the natural parallelism of the GPU, using Compute Unified Device Architecture (CUDA), Nvidia\u27s GPU programming API. Using sample data sizes for the I and Q channel inputs, performance improvements were seen in as little as 512 samples when using the GPU optimized version of OSSIE. As the sample size increased, the CUDA performance improved as well. Porting OSSIE components onto the CUDA architecture showed that improved performance can be seen in SDR related software through the use of GPU technology

Digital Repository @ Iowa State University (ISU)

ASCR/HEP Exascale Requirements Review Report

This draft report summarizes and details the findings, results, and recommendations derived from the ASCR/HEP Exascale Requirements Review meeting held in June, 2015. The main conclusions are as follows. 1) Larger, more capable computing and data facilities are needed to support HEP science goals in all three frontiers: Energy, Intensity, and Cosmic. The expected scale of the demand at the 2025 timescale is at least two orders of magnitude -- and in some cases greater -- than that available currently. 2) The growth rate of data produced by simulations is overwhelming the current ability, of both facilities and researchers, to store and analyze it. Additional resources and new techniques for data analysis are urgently needed. 3) Data rates and volumes from HEP experimental facilities are also straining the ability to store and analyze large and complex data volumes. Appropriately configured leadership-class facilities can play a transformational role in enabling scientific discovery from these datasets. 4) A close integration of HPC simulation and data analysis will aid greatly in interpreting results from HEP experiments. Such an integration will minimize data movement and facilitate interdependent workflows. 5) Long-range planning between HEP and ASCR will be required to meet HEP's research needs. To best use ASCR HPC resources the experimental HEP program needs a) an established long-term plan for access to ASCR computational and data resources, b) an ability to map workflows onto HPC resources, c) the ability for ASCR facilities to accommodate workflows run by collaborations that can have thousands of individual members, d) to transition codes to the next-generation HPC platforms that will be available at ASCR facilities, e) to build up and train a workforce capable of developing and using simulations and analysis to support HEP scientific research on next-generation systems.Comment: 77 pages, 13 Figures; draft report, subject to further revisio

arXiv.org e-Print Archive

eScholarship - University of California

Exploiting BSP Abstractions for Compiler Based Optimizations of GPU Applications on multi-GPU Systems

Author: Matz Alexander
Publication venue
Publication date: 01/01/2020
Field of study

Graphics Processing Units (GPUs) are accelerators for computers and provide massive amounts of computational power and bandwidth for amenable applications. While effectively utilizing an individual GPU already requires a high level of skill, effectively utilizing multiple GPUs introduces completely new types of challenges. This work sets out to investigate how the hierarchical execution model of GPUs can be exploited to simplify the utilization of such multi-GPU systems. The investigation starts with an analysis of the memory access patterns exhibited by applications from common GPU benchmark suites. Memory access patterns are collected using custom instrumentation and a simple simulation then analyzes the patterns and identifies implicit communication across the different levels of the execution hierarchy. The analysis reveals that for most GPU applications memory accesses are highly localized and there exists a way to partition the workload so that the communication volume grows slower than the aggregated bandwidth for growing numbers of GPUs. Next, an application model based on Z-polyhedra is derived that formalizes the distribution of work across multiple GPUs and allows the identification of data dependencies. The model is then used to implement a prototype compiler that consumes single-GPU programs and produces executables that distribute GPU workloads across all available GPUs in a system. It uses static analysis to identify memory access patterns and polyhedral code generation in combination with a dynamic tracking system to efficiently resolve data dependencies. The prototype is implemented as an extension to the LLVM/Clang compiler and published in full source. The prototype compiler is then evaluated using a set of benchmark applications. While the prototype is limited in its applicability by technical issues, it provides impressive speedups of up to 12.4x on 16 GPUs for amenable applications. An in-depth analysis of the application runtime reveals that dependency resolution takes up less than 10% of the runtime, often significantly less. A discussion follows and puts the work into context by presenting and differentiating related work, reflecting critically on the work itself and an outlook of the aspects that could be explored as part of this research. The work concludes with a summary and a closing opinion

Heidelberger Dokumentenserver