15 research outputs found
Acceleration of Multiresolution Imaging Algorithms: A Comparative Study
Abstract—In this paper we consider a multiresolution filter and its realization on the Cell BE and GPUs. We not only present common and specific optimization strategies undertaken for obtaining maximum performance on these architectures, but also how to obtain a speedup of 6.57x and 33.24x compared to an optimized OpenMP baseline implementation. Furthermore, we also undertake automated configuration space exploration of different partitioning possibilities for selection of best tiling parameters. I
FSM-controlled architectures for linear invasion
Abstract—Invasive computing is a novel concept in multiprocessor architecture and programming. Invasion will become an important step towards self-organizing behavior which will be needed in the next generation of massively parallel MPSoCs with unrivaled performance and resource efficiency numbers as one of the main challenges for MPSoC apart from their programming. In this paper we introduce and model a finite state machine for controlling the invasive process in different architectural granularities. The applicability of our FSM is tested in case studies for a reconfigurable MPSoC platform and a fine-grained platform. The results show substantial flexibility gains with only marginal additional hardware cost
FLOWER: A comprehensive dataflow compiler for high-level synthesis
FPGAs have found their way into data centers as accelerator cards, making
reconfigurable computing more accessible for high-performance applications. At
the same time, new high-level synthesis compilers like Xilinx Vitis and runtime
libraries such as XRT attract software programmers into the reconfigurable
domain. While software programmers are familiar with task-level and
data-parallel programming, FPGAs often require different types of parallelism.
For example, data-driven parallelism is mandatory to obtain satisfactory
hardware designs for pipelined dataflow architectures. However, software
programmers are often not acquainted with dataflow architectures - resulting in
poor hardware designs.
In this work we present FLOWER, a comprehensive compiler infrastructure that
provides automatic canonical transformations for high-level synthesis from a
domain-specific library. This allows programmers to focus on algorithm
implementations rather than low-level optimizations for dataflow architectures.
We show that FLOWER allows to synthesize efficient implementations for
high-performance streaming applications targeting System-on-Chip and FPGA
accelerator cards, in the context of image processing and computer vision
AnySeq: A high performance sequence alignment library based on partial evaluation
Sequence alignments are fundamental to bioinformatics which has resulted in a
variety of optimized implementations. Unfortunately, the vast majority of them
are hand-tuned and specific to certain architectures and execution models. This
not only makes them challenging to understand and extend, but also difficult to
port to other platforms. We present AnySeq - a novel library for computing
different types of pairwise alignments of DNA sequences. Our approach combines
high performance with an intuitively understandable implementation, which is
achieved through the concept of partial evaluation. Using the AnyDSL compiler
framework, AnySeq enables the compilation of algorithmic variants that are
highly optimized for specific usage scenarios and hardware targets with a
single, uniform codebase. The resulting domain-specific library thus allows the
variation of alignment parameters (such as alignment type, scoring scheme, and
traceback vs.~plain score) by simple function composition rather than
metaprogramming techniques which are often hard to understand. Our
implementation supports multithreading and SIMD vectorization on CPUs,
CUDA-enabled GPUs, and FPGAs. AnySeq is at most 7% slower and in many cases
faster (up to 12%) than state-of-the art manually optimized alignment libraries
on CPUs (SeqAn) and on GPUs (NVBio).Comment: To be published in IPDPS 2020. This work is supported by the Federal
Ministry of Education and Research (BMBF) as part of the MetaDL, Metacca, and
ProThOS projects as well as by the Intel Visual Computing Institute (IVCI)
and Cluster of Excellence on Multimodal Computing and Interaction (MMCI) at
Saarland Universit