Search CORE

13 research outputs found

Compiling Geometric Algebra Computations into Reconfigurable Hardware Accelerators

Author: Hildenbrand Dietmar
Huthmann Jens
Koch Andreas
Stock Florian
Publication venue: Dagstuhl Seminar Proceedings. 10281 - Dynamically Reconfigurable Architectures
Publication date: 01/01/2010
Field of study

Geometric Algebra (GA), a generalization of quaternions and complex numbers, is a very powerful framework for intuitively expressing and manipulating the complex geometric relationships common to engineering problems. However, actual processing of GA expressions is very compute intensive, and acceleration is generally required for practical use. GPUs and FPGAs offer such acceleration, while requiring only low-power per operation. In this paper, we present key components of a proof-of-concept compile flow combining symbolic and hardware optimization techniques to automatically generate hardware accelerators from the abstract GA descriptions that are suitable for high-performance embedded computing

DROPS Dagstuhl Research Online Publication Server

White Paper from Workshop on Large-scale Parallel Numerical Computing Technology (LSPANC 2020): HPC and Computer Arithmetic toward Minimal-Precision Computing

Author: Boku Taisuke
Domke Jens
Fujita Norihisa
Fukaya Takeshi
Hoshi Takeo
Huthmann Jens
Iakymchuk Roman
Imamura Toshiyuki
Jézéquel Fabienne
Kudo Shuhei
Mukunoki Daichi
Murakami Yuki
Nakata Maho
Ogita Takeshi
Ohlhus Kai Torben
Podobas Artur
Sano Kentaro
Tan Yiyu
Publication venue
Publication date: 07/04/2020
Field of study

In numerical computations, precision of floating-point computations is a key factor to determine the performance (speed and energy-efficiency) as well as the reliability (accuracy and reproducibility). However, precision generally plays a contrary role for both. Therefore, the ultimate concept for maximizing both at the same time is the minimal-precision computing through precision-tuning, which adjusts the optimal precision for each operation and data. Several studies have been already conducted for it so far (e.g. Precimoniuos and Verrou), but the scope of those studies is limited to the precision-tuning alone. Hence, we aim to propose a broader concept of the minimal-precision computing system with precision-tuning, involving both hardware and software stack. In 2019, we have started the Minimal-Precision Computing project to propose a more broad concept of the minimal-precision computing system with precision-tuning, involving both hardware and software stack. Specifically, our system combines (1) a precision-tuning method based on Discrete Stochastic Arithmetic (DSA), (2) arbitrary-precision arithmetic libraries, (3) fast and accurate numerical libraries, and (4) Field-Programmable Gate Array (FPGA) with High-Level Synthesis (HLS). In this white paper, we aim to provide an overview of various technologies related to minimal- and mixed-precision, to outline the future direction of the project, as well as to discuss current challenges together with our project members and guest speakers at the LSPANC 2020 workshop; https://www.r-ccs.riken.jp/labs/lpnctrt/lspanc2020jan/

arXiv.org e-Print Archive

HAL Descartes

Effectiveness of lumbar cerebrospinal fluid drain among patients with aneurysmal subarachnoid hemorrhage

Author: Al-Jehani Hosam
Alturki Abdulrahman Y.
Arouk Wasim
Bardutzky Jürgen
Barner Christoph
Baro Norbert
Bauer Miriam
Beck Jürgen
Beyer Christian
Beyer Desiree
Czorlich Patrick
Dengler Nora F.
Ehlert Angelika
Engel Doortje C.
Finger Tobias
Francis Roland
Fritsch Michael
Fung Christian
Gremmer Rudolf
Hagedorn Sabine
Hildebrandt Gerhard
Hotter Benjamin
Hunsicker Oliver
Huscher Karen
Hutchinson Peter
Huthmann Alexandra
Huttner Hagen
Juratli Tareq A.
Jussen Daniel
Jüttler Eric
Kerz Thomas
Landscheidt Julia
Lange Heidrun
Lange Katharina
Lemcke Johannes
Malinova Vesna
Meier Ullrich
Meyer Bernhard
Mielke Dorothee
Müller Daniela
Müller Oliver
Nagel Christoph
Niesen Wolf-Dirk
Podlesik Dino
Pohrt Anne
Päsler Dennis
Regelsberger Jan
Reinhardt Stephanie
Rohde Veit
Ryang Yu-Mi
Salih Farid
Sarge Robert
Sauvigny Thomas
Schackert Gabriele
Schaumann Andreas
Schroeder Henry W. S.
Schul David B.
Schwab Stefan
Schürer Ludwig
Sinclair David B.
Soell Nicole
Staykov Dimitre
Sure Ulrich
Toeroek Elisabeth
Tseng Ming-Yuan
Vajkoczy Peter
von Dincklage Falk
Westphal Manfred
Witsch Jens
Wolf Stefan
Wostrack Maria
Publication venue
Publication date: 01/01/2023
Field of study

OPUS Augsburg

An Execution Model and High-Level-Synthesis System for Generating SIMT Multi-Threaded Hardware from C Source Code

Author: Huthmann Jens Christoph
Publication venue
Publication date: 01/01/2017
Field of study

The performance improvement of conventional processor has begun to stagnate in recent years. Because of this, researchers are looking for new possibilities to improve the performance of computing systems. Heterogeneous systems turned out to be a powerful possibility. In the context of this thesis, a heterogeneous system consists of a software-programmable processor and a FPGA based configurable hardware accelerator. By using an accelerator specifically tailored to a particular application, heterogeneous system can achieve a higher performance that conventional processors. Due to their increased complexity, it is more complicated to develop applications for heterogeneous systems than for conventional systems based on a software-programmable processor. For programming the software and hardware parts, different languages have to be used and additional specialised hardware-knowledge is required. Both factors increase the development cost. This work presents the compiler framework Nymble which allows to program a heterogeneous system with only a single high-level language. In the high-level language the developer only has to select which parts of the application should be executed in hardware. Nymble then generates a program for the software-processor, the configuration of the hardware, and all interfaces between software and hardware. All heterogeneous systems supported by Nymble have in common that the software and hardware parts of an application have access to a shared memory. As this memory is external RAM with high access latency, it is necessary to insert a cache between the memory and hardware. With this cache, memory accesses can vary between very short or long access latency depending on whether the data is available in the cache. To hide long latencies, this thesis presents a novel execution model which allows the simultaneous execution of multiple threads in a single accelerator. Additionally, the model enables threads to be dynamically reordered at specific points in the common accelerator pipeline. This capability is used to let other (non-waiting) threads overtake a thread which is waiting for a memory access. Thus, these other threads can execute their calculations independently of the waiting thread to bridge the latency of memory accesses. Previous works are using execution models which only allow a single thread to be active in the accelerator. In case of a memory access with long latency, the thread is exchanged with another non-waiting thread. This design of the hardware often causes many resources to lie idle for a significant amount of time. In contrast, the presented novel execution model dynamically spreads multiple threads over the pipeline. This results in a higher utilisation of the resources by using resources more effectively. Furthermore, the simultaneous execution of multiple threads can achieve similar throughput as multiple copies of a single-threaded accelerator running in parallel. The new execution model makes it possible to combine the improved throughput of multiple copies with the increased efficiency of simultaneous threads in a single accelerator. Thread reordering allows the new model to be effectively used with a cached shared-memory. In comparison, between four copies of a single-threaded accelerator and a multi-thread accelerator with four thread (both created by Nymble), a resource efficiency of up to factor 2.6x can be achieved. At the same time, four simultaneous threads can be up to 4x as fast as four threads executed consecutively on a single accelerator. Compared to other, more optimised compilers, Nymble can still achieve up to 2x faster runtime with 1.5x resource efficiency

TUbiblio

tuprints

An Execution Model and High-Level-Synthesis System for Generating SIMT Multi-Threaded Hardware from C Source Code

Author: Huthmann Jens Christoph
Publication venue
Publication date: 01/01/2017
Field of study

TUbiblio

A.: Accelerating high-level engineering computations by automatic compilation of geometric algebra to hardware accelerators

Author: Andreas Koch
Dietmar Hildenbrand
Florian Stock
Jens Huthmann
Peter Müller
Publication venue
Publication date: 02/04/2020
Field of study

Abstract-Geometric Algebra (GA), a generalization of quaternions, is a very powerful form for intuitively expressing and manipulating complex geometric relationships common to engineering problems. The actual evaluation of GA expressions, though, is extremely compute intensive due to the high-dimensionality of data being processed. On standard desktop CPUs, GA evaluations take considerably longer than conventional mathematical formulations. GPUs do offer sufficient throughput to make the use of concise GA formulations practical, but require power far exceeding the budgets for most embedded applications. While the suitability of low-power reconfigurable accelerators for evaluating specific GA computations has already been demonstrated, these often required a significant manual design effort. We present a proof-of-concept compile flow combining symbolic and hardware optimization techniques to automatically generate accelerators from the abstract GA descriptions without user intervention that are suitable for high-performance embedded computing

CiteSeerX

Optimizing Precision for High-Performance, Robust, and Energy-Efficient Computations

Author: Boku Taisuke
Fujita Norihisa
Graillat Stef
Huthmann Jens
Iakymchuk Roman
Imamura Toshiyuki
Jézéquel Fabienne
Koshiba Atsushi
Mukunoki Daichi
Sano Kentaro
Tan Yiyu
Publication venue: HAL CCSD
Publication date
Field of study

International audienc

Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations

Author: Boku Taisuke
Fujita Norihisa
Graillat Stef
Huthmann Jens
Iakymchuk Roman
Imamura Toshiyuki
Jézéquel Fabienne
Koshiba Atsushi
Mukunoki Daichi
Sano Kentaro
Tan Yiyu
Publication venue: HAL CCSD
Publication date
Field of study

International audienceWe propose a new systematic approach for minimal-precisioncomputations. This approach is reliable, general, comprehensive,high-performance, and realistic. Although the proposed systemis still in development, this presentation shows that the systemcould be constructed by combining already available (developed)in-house technologies as well as extending them