Search CORE

17 research outputs found

Dynamic Configuration of CUDA Runtime Variables for CDP-based Divide-and-Conquer Algorithms

Author: A Brodtkorb
B Gendron
G Karypis
J Cirasella
KA Yelick
M Plauth
T Zhang
W Cook
Publication venue: HAL CCSD
Publication date: 17/09/2018
Field of study

International audienceCUDA Dynamic Parallelism (CDP) is an extension of the GPGPU programming model proposed to better address irregular applications and recursive patterns of computation. However, processing memory demanding problems by using CDP is not straightforward, because of its particular memory organization. This work presents an algorithm to deal with such an issue. It dynamically calculates and configures the CDP runtime variables and the GPU heap on the basis of an analysis of the partial backtracking tree. The proposed algorithm was implemented for solving permutation combinatorial problems and experimented on two test-cases: N-Queens and the Asymmetric Travelling Salesman Problem. The proposed algorithm allows different CDP-based backtracking from the literature to solve memory demanding problems, adaptively with respect to the number of recursive kernel generations and the presence of dynamic allocations on GPU

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Building and Combining Matching Algorithms

Author: A Armando
A Armando
A Boudet
AK Eeralla
B Blanchet
C Kirchner
C Lynch
C Ringeissen
C Ringeissen
C Ringeissen
C Tinelli
D Kapur
Don Pigozzi
E Domenjoud
E Tidén
F Baader
F Baader
F Baader
F Baader
F Baader
F Baader
G Nelson
H Comon
H Comon-Lundh
J Meseguer
J Otop
JP Jouannaud
KA Yelick
L Bachmair
L Bachmair
M Abadi
M Adi
M Schmidt-Schauß
M Schmidt-Schauß
MP Bonacina
P Borovanský
P Moreau
S Erbatur
S Erbatur
S Erbatur
S Escobar
S Escobar
S Meier
T Nipkow
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2019
Field of study

International audienceThe concept of matching is ubiquitous in declarative programming and in automated reasoning. For instance, it is a key mechanism to run rule-based programs and to simplify clauses generated by theorem provers. A matching problem can be seen as a particular conjunction of equations where each equation has a ground side. We give an overview of techniques that can be applied to build and combine matching algorithms. First, we survey mutation-based techniques as a way to build a generic matching algorithm for a large class of equational theories. Second, combination techniques are introduced to get combined matching algorithms for disjoint unions of theories. Then we show how these combination algorithms can be extended to handle non-disjoint unions of theories sharing only constructors. These extensions are possible if an appropriate notion of normal form is computable

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Recommended from our members

Scaling communication-intensive applications on bluegene/P using one-sided communication and overlap

Author: Bonachea DO
Hargrove PH
Nishtala R
Yelick KA
Publication venue: eScholarship, University of California
Publication date: 25/11/2009
Field of study

In earlier work, we showed that the one-sided communication model found in PGAS languages (such as UPC) offers significant advantages in communication efficiency by decoupling data transfer from processor synchronization. We explore the use of the PGAS model on IBM Blue-Gene/P, an architecture that combines low-power, quad-core processors with extreme scalability. We demonstrate that the PGAS model, using a new port of the Berkeley UPC compiler and GASNet one-sided communication layer, outperforms two-sided (MPI) communication in both microbenchmarks and a case study of the communication-limited benchmark, NAS FT. We scale the benchmark up to 16,384 cores of the BlueGene/P and demonstrate that UPC consistently outperforms MPI by as much as 66% for some processor configurations and an average of 32%. In addition, the results demonstrate the scalability of the PGAS model and the Berkeley implementation of UPC, the viability of using it on machines with multicore nodes, and the effectiveness of the BG/P communication layer for supporting one-sided communication and PGAS languages. © 2009 IEEE

eScholarship - University of California

Recommended from our members

Tuning collective communication for Partitioned Global Address Space programming models

Author: Hargrove PH
Nishtala R
Yelick KA
Zheng Y
Publication venue: eScholarship, University of California
Publication date: 01/09/2011
Field of study

Partitioned Global Address Space (PGAS) languages offer programmers the convenience of a shared memory programming style combined with locality control necessary to run on large-scale distributed memory systems. Even within a PGAS language programmers often need to perform global communication operations such as broadcasts or reductions, which are best performed as collective operations in which a group of threads work together to perform the operation. In this paper we consider the problem of implementing collective communication within PGAS languages and explore some of the design trade-offs in both the interface and implementation. In particular, PGAS collectives have semantic issues that are different than in send-receive style message passing programs, and different implementation approaches that take advantage of the one-sided communication style in these languages. We present an implementation framework for PGAS collectives as part of the GASNet communication layer, which supports shared memory, distributed memory and hybrids. The framework supports a broad set of algorithms for each collective, over which the implementation may be automatically tuned. Finally, we demonstrate the benefit of optimized GASNet collectives using application benchmarks written in UPC, and demonstrate that the GASNet collectives can deliver scalable performance on a variety of state-of-the-art parallel machines including a Cray XT4, an IBM BlueGene/P, and a Sun Constellation system with InfiniBand interconnect. © 2011 Elsevier B.V. All rights reserved

eScholarship - University of California

Recommended from our members

Foreword

Author: Bradford L
Filippone S
Long B
Morris K
Yelick KA
Zheng Y
Publication venue: eScholarship, University of California
Publication date: 30/01/2017
Field of study

eScholarship - University of California

A GPU-Based Backtracking Algorithm for Permutation Combinatorial Problems

Author: D Defour
D Knuth
G Karypis
J Cirasella
J Jenkins
K Rocki
KA Yelick
L Li
RM Karp
W Cook
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/12/2016
Field of study

International audienceThis work presents a GPU-based backtracking algorithm for permutation combinatorial problems based on the Integer-Vector-Matrix (IVM) data structure. IVM is a data structure dedicated to permutation combinatorial optimization problems. In this algorithm, the load balancing is performed without intervention of the CPU, inside a work stealing phase invoked after each node expansion phase. The proposed work stealing approach uses a virtual n-dimensional hypercube topology and a triggering mechanism to reduce the overhead incurred by dynamic load balancing. We have implemented this new algorithm for solving instances of the Asymmetric Travelling Salesman Problem by implicit enumeration, a scenario where the cost of node evaluation is low, compared to the overall search procedure. Experimental results show that the dynamically load balanced IVM-algorithm reaches speed-ups up to 17X over a serial implementation using a bitset-data structure and up to 2X over its GPU counterpart

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

11

Author: AD Fekete
D Goodeve
Jonathan K. Lee
KA Yelick
Marina Andrić
R Nicola De
S Gupta
SA Dobson
VA Saraswat
Vineet Kumar
Publication venue: 神戸高等商業学校
Publication date: 01/02/1911
Field of study

Linguistic primitives for replica-aware coordination offer suitable solutions to the challenging problems of data distribution and locality in large-scale high-performance computing. The data replication mechanisms that had previously been designed to extend Klaim with replicated tuples are now used to experiment with X10, a parallel programming language primarily targeting clusters of multi-core processors linked in a large-scale system via high-performance networks. Our approach aims at allowing the programmer to specify and coordinate the replication of shared data items by taking into account the desired consistency properties. The programmer can hence exploit such flexible mechanisms to adapt data distribution and locality to the needs of the application, in order to improve performance in terms of concurrency and data access. We investigate issues related to replica consistency and provide a performance analysis, which includes scenarios where replica-based specifications and relaxed consistency provide significant performance gains

Crossref

Online Research Database In Technology

Kobe University Repository Kernel

IMT Institutional Repository

Spermiogenesis and exchange of basic nuclear proteins are impaired in male germ cells lacking Camk4

Author: A Pirhonen
AJ Louie
AM Edelman
BP Kennedy
F Nantel
G Bench
GR Green
H Bito
H Enslen
H Tokumitsu
J Zhong
JA Blendy
JC Reyes
JY Wu
K Lee
KA Anderson
KO Soderstrom
L de Yebra
LD Russell
ML Meistrich
P Sun
PC Yelick
RS Westphal
SM Elsevier
T Chatila
X Xu
Z Sun
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref