Search CORE

53 research outputs found

Multi-processor system-level synthesis for multiple applications on platform fpga

Author: A Kumar
Akash Kumar
Bart Mesman
Henk Corporaal
Mesman B. Y Ha
S D Fernando
Shakith Fernando
Yajun Ha
Publication venue
Publication date: 01/01/2007
Field of study

ABSTRACT Multiprocessor systems-on-chip (MPSoC) are being developed in increasing numbers to support the high number of applications running on modern embedded systems. Designing and programming such systems prove to be a major challenge. Most of the current design methodologies rely on creating the design by hand, and are therefore error-prone and time-consuming. This also limits the number of design points that can be explored. While some efforts have been made to automate the flow and raise the abstraction level, these are still limited to single-application designs. In this paper, we present a design methodology to generate and program MPSoC designs in a systematic and automated way for multiple applications. The architecture is automatically inferred from the application specifications, and customized for it. The flow is ideal for fast design space exploration (DSE) in MPSoC systems. We present results of a case study to compute the buffer-throughput trade-offs in real-life applications, H263 and JPEG decoders. The generation of the entire project takes about 100ms, and the whole DSE was completed in 45 minutes, including the FPGA mapping and synthesis

CiteSeerX

Chapter 4 DATAFLOW ANALYSIS FOR REAL-TIME EMBEDDED MULTIPROCESSOR SYSTEM DESIGN

Author: Bart Mesman
Er Stuijk
Jan David Mol
Jef Van Meerbergen
Marco Bekooij
O Moreira
Peter Poplavko
Rob Hoes
Publication venue
Publication date
Field of study

Keywords: Dataflow analysis techniques are key to reduce the number of design iterations and shorten the design time of real-time embedded network based multiprocessor systems that process data streams. With these analysis techniques the worstcase end-to-end temporal behavior of hard real-time applications can be derived from a dataflow model in which computation, communication and arbitration is modeled. For soft real-time applications these static dataflow analysis techniques are combined with simulation of the dataflow model to test statistical assertions about their temporal behavior. The simulation results in combination with properties of the dataflow model are used to derive the sensitivity of design parameters and to estimate parameters like the capacity of data buffers. real-time, dataflow analysis, multiprocessor system, predictable design, systemon-chip 1

CiteSeerX

Embedded Systems Roadmap 2002:Vision on technology for the future of PROGRESS

Author: Brinksma Ed
Deprettere Ed
Eggermont Ludwig D.J.
Hendriksen Wim
Krol Thijs
Mesman Bart
Spaanenburg Ben
Timmer Floris
van Gageldonk Hans
van Leuken René
Verhulst Erik
Publication venue: 'Towarzystwo Naukowe W Toruniu'
Publication date: 01/03/2002
Field of study

University of Twente Research Information

Constraint analysis for DSP code generation

Author: MESMAN BART
Publication venue: Universiteit Eindhoven
Publication date
Field of study

+113hlm.;24c

uilis.unsyiah.ac.id

Static Resource Models for Code-Size Efficient Embedded Processors

Author: Bart Mesman
Qin Zhao
Publication venue
Publication date
Field of study

CiteSeerX

Practical Instruction Set Design and Compiler Retargetability Using Static Resource Models

Author: Bart Mesman
Qin Zhao
Twan Basten
Publication venue
Publication date: 01/01/2002
Field of study

The design of application (-domain) specific instructionset processors (ASIPs), optimized for code size, has traditionally been accompanied by the necessity to program assembly, at least for the performance critical parts of the application. The highly encoded instruction sets simply lack the orthogonal structure present in e.g. VLIW processors, that allows efficient compilation. This lack of efficient compilation tools has also severely hampered the design space exploration of code-size efficient instruction sets, and correspondingly, their tuning to the application domain. In [13] a practical method is demonstrated to model a broad class of highly encoded instruction sets in terms of virtual resources easily interpreted by classic resource constrained schedulers (such as the popular list-scheduling algorithm), thereby allowing efficient compilation with well understood compilation tools. In this paper we will demonstrate the suitability of this model to also enable instruction set design (-space exploration) with a simple, well-understood and proven method long used in the High-Level Synthesis (HLS) of ASICs. A small case study proves the practical applicability of the method

CiteSeerX

Skeleton-based automatic parallelization of image processing algorithms for GPUs

Author: Bart Mesman
Cedric Nugteren
Henk Corporaal
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

Abstract—Graphics Processing Units (GPUs) are becoming increasingly important in high performance computing. To main-tain high quality solutions, programmers have to efficiently parallelize and map their algorithms. This task is far from trivial, leading to the necessity to automate this process. In this paper, we present a technique to automatically par-allelize and map sequential code on a GPU, without the need for code-annotations. This technique is based on skeletonization and is targeted at image processing algorithms. Skeletonization separates the structure of a parallel computation from the algo-rithm’s functionality, enabling efficient implementations without requiring architecture knowledge from the programmer. We define a number of skeleton classes, each enabling GPU specific parallelization techniques and optimizations, including automatic thread creation, on-chip memory usage and memory coalescing. Recently, similar skeletonization techniques have been applied to GPUs. Our work uses domain specific skeletons and a finer-grained classification of algorithms. Comparing skeleton-based parallelization to existing GPU code generators in general, we potentially achieve a higher hardware efficiency by enabling algorithm restructuring through skeletons. In a set of benchmarks, we show that the presented skeleton-based approach generates highly optimized code, achieving high data throughput. Additionally, we show that the automatically generated code performs close or equal to manually mapped and optimized code. We conclude that skeleton-based parallelization for GPUs is promising, but we do believe that future research must focus on the identification of a finer-grained and complete classification. I

CiteSeerX

Crossref

A probabilistic approach to model resource contention for performance estimation of multi-featured media devices

Author: Akash Kumar
Akash Kumar
Bart Mesman
Bart Mesman
Bart Theelen
Bart Theelen
Henk Corporaal
Henk Corporaal
Mesman B., Corporaal, H., Theelen, B. D A Kumar
Y Ha
Yajun Ha
Yajun Ha
Publication venue
Publication date: 01/01/2007
Field of study

ABSTRACT The number of features that are supported in modern multimedia devices is increasing faster than ever. Estimating the performance of such applications when they are running on shared resources is becoming increasingly complex. Simulation of all possible use-cases is very time-consuming and often undesirable. In this paper, a new technique is proposed based on probabilistically estimating the performance of concurrently executing applications that share resources. Two different methods of employing this approach are presented and compared with state-of-the-art technique, and with achieved performance found through extensive simulations. The results are within 15% of simulation result (considered as reference case) and up to ten times better than a worst-case estimation approach. The approach scales very well with increasing number of applications, and can also be applied at run-time for admission control

CiteSeerX