Search CORE

44 research outputs found

Adaptive Task Size Control on High Level Programming for GPU/CPU Work Sharing

Author: Aumage Olivier
Boku Taisuke
Hanawa Toshihiro
Kodama Yuetsu
Namyst Raymond
Odajima Tetsuya
Sato Mitsuhisa
Thibault Samuel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

International audienceOn the work sharing among GPUs and CPU cores on GPU equipped clusters, it is a critical issue to keep load balance among these heterogeneous computing resources. We have been developing a runtime system for this problem on PGAS language named XcalableMP- dev/StarPU [1]. Through the development, we found the necessity of adaptive load balancing for GPU/CPU work sharing to achieve the best performance for various application codes. In this paper, we enhance our language system XcalableMP-dev/StarPU to add a new feature which can control the task size to be assigned to these heterogeneous resources dynamically during application execution. As a result of performance evaluation on several benchmarks, we confirmed the proposed feature correctly works and the performance with heterogeneous work sharing provides up to about 40% higher performance than GPU-only utilization even for relatively small size of problems

Crossref

INRIA a CCSD electronic archive server

Oskar Bordeaux

White Paper from Workshop on Large-scale Parallel Numerical Computing Technology (LSPANC 2020): HPC and Computer Arithmetic toward Minimal-Precision Computing

Author: Boku Taisuke
Domke Jens
Fujita Norihisa
Fukaya Takeshi
Hoshi Takeo
Huthmann Jens
Iakymchuk Roman
Imamura Toshiyuki
Jézéquel Fabienne
Kudo Shuhei
Mukunoki Daichi
Murakami Yuki
Nakata Maho
Ogita Takeshi
Ohlhus Kai Torben
Podobas Artur
Sano Kentaro
Tan Yiyu
Publication venue
Publication date: 07/04/2020
Field of study

In numerical computations, precision of floating-point computations is a key factor to determine the performance (speed and energy-efficiency) as well as the reliability (accuracy and reproducibility). However, precision generally plays a contrary role for both. Therefore, the ultimate concept for maximizing both at the same time is the minimal-precision computing through precision-tuning, which adjusts the optimal precision for each operation and data. Several studies have been already conducted for it so far (e.g. Precimoniuos and Verrou), but the scope of those studies is limited to the precision-tuning alone. Hence, we aim to propose a broader concept of the minimal-precision computing system with precision-tuning, involving both hardware and software stack. In 2019, we have started the Minimal-Precision Computing project to propose a more broad concept of the minimal-precision computing system with precision-tuning, involving both hardware and software stack. Specifically, our system combines (1) a precision-tuning method based on Discrete Stochastic Arithmetic (DSA), (2) arbitrary-precision arithmetic libraries, (3) fast and accurate numerical libraries, and (4) Field-Programmable Gate Array (FPGA) with High-Level Synthesis (HLS). In this white paper, we aim to provide an overview of various technologies related to minimal- and mixed-precision, to outline the future direction of the project, as well as to discuss current challenges together with our project members and guest speakers at the LSPANC 2020 workshop; https://www.r-ccs.riken.jp/labs/lpnctrt/lspanc2020jan/

arXiv.org e-Print Archive

HAL Descartes

CP-PACS : A massively parallel processor for large scale scientific calculations

Author: Kisaburo Nakazawa
Taisuke Boku
Publication venue
Publication date
Field of study

CP-PACS (Computational Physics by Parallel Array Computer System) is a massively parallel processor with 2048 processing units built at Center for Computational Physics, University of Tsukuba. It has an MIMD architecture with distributed memory system. The node processor of CPPACS is a RISC microprocessor enhanced by Pseudo Vector Processing feature, which can realize high-performance vector processing. The interconnection network is 3-dimensional Hyper-Crossbar Network, which has high flexibility and embeddability for various network topologies and communication patterns. The theoretical peak performance of whole system is 614.4 GFLOPS. In this paper, we describe the overview of CP-PACS architecture and several special architectural characteristics of it. Then, several performance evaluations both for single node processor and for parallel system are described based on LINPACK and Kernel CG of NAS Parallel Benchmarks. Through these evaluations, the effectiveness of Pseudo Vector Proce..

CiteSeerX

Performance Improvement for Matrix Calculation on CP-PACS Node Processor

Author: Kisaburo Nakazawa
Taisuke Boku
Yoshito Abei
Publication venue
Publication date
Field of study

CP-PACS (Computational Physics by Parallel Array Computer System) is a massively parallel processing system with 2048 node processors for large scale scientific calculations. On a node processor of CPPACS, there is a special hardware feature called PVPSW (Pseudo Vector Processor based on Slide Window) , which realizes an efficient vector processing on a superscalar processor without depending on the cache. In this paper, we present the effectiveness of PVPSW by performance measurement on single node processor for LINPACK benchmark. Utilizing loop unrolling techniques and Block-TLB feature, PVP-SW function improves the basic performance up to 3.3 times faster for 1000 2 1000 LINPACK. This performance corresponds to the 73% of theoretical peak. 1 Introduction For efficient large scale scientific calculations on massively parallel processors (MPP's), the sustained performance of each node processor must be enough high as well as increasing the number of node processors. CP-PACS [1] (Comp..

CiteSeerX

再構成可能システムとGPUによる複合型高性能計算プラットフォーム

Author: BOKU Taisuke
ボクタイスケ
朴泰祐
Publication venue
Publication date
Field of study

Institutional Repositories DataBase (IRDB)