Search CORE

1,419 research outputs found

Architecture Design Space Exploration for Streaming Applications Through Timing Analysis

Author: Jansen Pierre G.
Kavaldjiev Nikolay
Smit Gerard J.M.
Wiggers Maarten H.
Publication venue: University of Twente, Centre for Telematica and Information Technology (CTIT)
Publication date: 01/01/2005
Field of study

In this paper we compare the maximum achievable throughput of different memory organisations of the processing elements that constitute a multiprocessor system on chip. This is done by modelling the mapping of a task with input and output channels on a processing element as a homogeneous synchronous dataflow graph, and use maximum cycle mean analysis to derive the throughput. In a HiperLAN2 case study we show how these techniques can be used to derive the required clock frequency and communication latencies in order to meet the application's throughput requirement on a multiprocessor system on chip that has one of the investigated memory organisations

CiteSeerX

University of Twente Research Information

Mapping Applications to an FPFA Tile

Author: Gerard J. M. Smit
Michel Rosien Yuanqing
Michèl A. J. Rosien
Thijs Krol
Yuanqing Guo
Publication venue: IEEE Computer Society
Publication date: 01/01/2003
Field of study

This paper introduces a transformational design method which can be used to map code written in a high level source language, like C, to a coarse grain reconfigurable architecture. The source code is first translated into a control data flow graph (CDFG), which is minimized using a set of behaviour preserving transformations, such as dependency analysis, common subexpression elimination, etc. After applying graph clustering, scheduling and allocation transformations on this minimized graph, it can be mapped onto the target architecture

CiteSeerX

University of Twente Research Information

Compiling dataflow graphs into hardware

Author: Rinker Robert E.
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2005
Field of study

Department Head: L. Darrell Whitley.2005 Fall.Includes bibliographical references (pages 121-126).Conventional computers are programmed by supplying a sequence of instructions that perform the desired task. A reconfigurable processor is "programmed" by specifying the interconnections between hardware components, thereby creating a "hardwired" system to do the particular task. For some applications such as image processing, reconfigurable processors can produce dramatic execution speedups. However, programming a reconfigurable processor is essentially a hardware design discipline, making programming difficult for application programmers who are only familiar with software design techniques. To bridge this gap, a programming language, called SA-C (Single Assignment C, pronounced "sassy"), has been designed for programming reconfigurable processors. The process involves two main steps - first, the SA-C compiler analyzes the input source code and produces a hardware-independent intermediate representation of the program, called a dataflow graph (DFG). Secondly, this DFG is combined with hardware-specific information to create the final configuration. This dissertation describes the design and implementation of a system that performs the DFG to hardware translation. The DFG is broken up into three sections: the data generators, the inner loop body, and the data collectors. The second of these, the inner loop body, is used to create a computational structure that is unique for each program. The other two sections are implemented by using prebuilt modules, parameterized for the particular problem. Finally, a "glue module" is created to connect the various pieces into a complete interconnection specification. The dissertation also explores optimizations that can be applied while processing the DFG, to improve performance. A technique for pipelining the inner loop body is described that uses an estimation tool for the propagation delay of the nodes within the dataflow graph. A scheme is also described that identifies subgraphs with the dataflow graph that can be replaced with lookup tables. The lookup tables provide a faster implementation than random logic in some instances

Mountain Scholar (Digital Collections of Colorado and Wyoming)

A Comparative Study of Scheduling Techniques for Multimedia Applications on SIMD Pipelines

Author: Arslan Mehmet Ali
Gruian Flavius
Kuchcinski Krzysztof
Publication venue
Publication date: 01/01/2015
Field of study

Parallel architectures are essential in order to take advantage of the parallelism inherent in streaming applications. One particular branch of these employ hardware SIMD pipelines. In this paper, we analyse several scheduling techniques, namely ad hoc overlapped execution, modulo scheduling and modulo scheduling with unrolling, all of which aim to efficiently utilize the special architecture design. Our investigation focuses on improving throughput while analysing other metrics that are important for streaming applications, such as register pressure, buffer sizes and code size. Through experiments conducted on several media benchmarks, we present and discuss trade-offs involved when selecting any one of these scheduling techniques.Comment: Presented at DATE Friday Workshop on Heterogeneous Architectures and Design Methods for Embedded Image Systems (HIS 2015) (arXiv:1502.07241

arXiv.org e-Print Archive

Lund University Publications

FPGA acceleration of DNA sequence alignment: design analysis and optimization

Author: Ng Ho-Cheung
Publication venue: Computing, Imperial College London
Publication date: 01/01/2022
Field of study

Existing FPGA accelerators for short read mapping often fail to utilize the complete biological information in sequencing data for simple hardware design, leading to missed or incorrect alignment. In this work, we propose a runtime reconfigurable alignment pipeline that considers all information in sequencing data for the biologically accurate acceleration of short read mapping. We focus our efforts on accelerating two string matching techniques: FM-index and the Smith-Waterman algorithm with the affine-gap model which are commonly used in short read mapping. We further optimize the FPGA hardware using a design analyzer and merger to improve alignment performance. The contributions of this work are as follows. 1. We accelerate the exact-match and mismatch alignment by leveraging the FM-index technique. We optimize memory access by compressing the data structure and interleaving the access with multiple short reads. The FM-index hardware also considers complete information in the read data to maximize accuracy. 2. We propose a seed-and-extend model to accelerate alignment with indels. The FM-index hardware is extended to support the seeding stage while a Smith-Waterman implementation with the affine-gap model is developed on FPGA for the extension stage. This model can improve the efficiency of indel alignment with comparable accuracy versus state-of-the-art software. 3. We present an approach for merging multiple FPGA designs into a single hardware design, so that multiple place-and-route tasks can be replaced by a single task to speed up functional evaluation of designs. We first experiment with this approach to demonstrate its feasibility for different designs. Then we apply this approach to optimize one of the proposed FPGA aligners for better alignment performance.Open Acces

Spiral - Imperial College Digital Repository