Search CORE

5,707 research outputs found

A Similarity Measure for GPU Kernel Subgraph Matching

Author: A Sabne
BP Miller
C Böhm
F Zhang
G Ammons
L Adhianto
MH Williams
R Lim
R Singh
RC Gonzales
SS Shende
T Ball
Publication venue
Publication date: 21/03/2019
Field of study

Accelerator architectures specialize in executing SIMD (single instruction, multiple data) in lockstep. Because the majority of CUDA applications are parallelized loops, control flow information can provide an in-depth characterization of a kernel. CUDAflow is a tool that statically separates CUDA binaries into basic block regions and dynamically measures instruction and basic block frequencies. CUDAflow captures this information in a control flow graph (CFG) and performs subgraph matching across various kernel's CFGs to gain insights to an application's resource requirements, based on the shape and traversal of the graph, instruction operations executed and registers allocated, among other information. The utility of CUDAflow is demonstrated with SHOC and Rodinia application case studies on a variety of GPU architectures, revealing novel thread divergence characteristics that facilitates end users, autotuners and compilers in generating high performing code

arXiv.org e-Print Archive

Resource Allocation for Power Minimization in the Downlink of THP-based Spatial Multiplexing MIMO-OFDMA Systems

Author: Moretti Marco
Sanguinetti Luca
Wang Xiaodong
Publication venue
Publication date: 18/04/2014
Field of study

In this work, we deal with resource allocation in the downlink of spatial multiplexing MIMO-OFDMA systems. In particular, we concentrate on the problem of jointly optimizing the transmit and receive processing matrices, the channel assignment and the power allocation with the objective of minimizing the total power consumption while satisfying different quality-of-service requirements. A layered architecture is used in which users are first partitioned in different groups on the basis of their channel quality and then channel assignment and transceiver design are sequentially addressed starting from the group of users with most adverse channel conditions. The multi-user interference among users belonging to different groups is removed at the base station using a Tomlinson-Harashima pre-coder operating at user level. Numerical results are used to highlight the effectiveness of the proposed solution and to make comparisons with existing alternatives.Comment: 12 pages, 6 figures, IEEE Trans. Veh. Techno

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

HAL-Rennes 1

Algorithm and VLSI Architecture Design for MPEG-Like High Definition Video Coding‐AVS Video Coding from Standard Specification to VLSI Implementation

Author: Yin Haibing
Publication venue: 'IntechOpen'
Publication date: 09/01/2013
Field of study

Book of Abstracts of the Sixth SIAM Workshop on Combinatorial Scientific Computing

Author: Uçar Bora
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/08/2014
Field of study

Book of Abstracts of CSC14 edited by Bora UçarInternational audienceThe Sixth SIAM Workshop on Combinatorial Scientific Computing, CSC14, was organized at the Ecole Normale Supérieure de Lyon, France on 21st to 23rd July, 2014. This two and a half day event marked the sixth in a series that started ten years ago in San Francisco, USA. The CSC14 Workshop's focus was on combinatorial mathematics and algorithms in high performance computing, broadly interpreted. The workshop featured three invited talks, 27 contributed talks and eight poster presentations. All three invited talks were focused on two interesting fields of research specifically: randomized algorithms for numerical linear algebra and network analysis. The contributed talks and the posters targeted modeling, analysis, bisection, clustering, and partitioning of graphs, applied in the context of networks, sparse matrix factorizations, iterative solvers, fast multi-pole methods, automatic differentiation, high-performance computing, and linear programming. The workshop was held at the premises of the LIP laboratory of ENS Lyon and was generously supported by the LABEX MILYON (ANR-10-LABX-0070, Université de Lyon, within the program ''Investissements d'Avenir'' ANR-11-IDEX-0007 operated by the French National Research Agency), and by SIAM

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot