Search CORE

3,647 research outputs found

Performance and Interface Buffer Size Driven Behavioral Partitioning for Embedded Systems

Author: Cyre W. R.
Lin Ta-Cheng
Sait Sadiq M.
Publication venue
Publication date: 01/04/1998
Field of study

One of the major differences in partitioning for codesign is in the way the communication cost is evaluated. Generally the size of the edge cut-set is used. When communication between components is through buffered channels, the size of the edge cut-set is not adequate to estimate the buffer size. A second important factor to measure the quality of partitioning is the system delay. Most partitioning approaches use the number of nodes/functions in each partition as constraints and attempt to minimize the communication cost. The data dependencies among nodes/functions, and their delays are not considered. In this paper we present partitioning with two objectives: (1) buffer size, which is estimated by analyzing the data flow patterns of the CDFG, and solved as a clique partitioning problem, and (2) the system delay that is estimated using List Scheduling. We pose the problem as a combinatorial optimization and use an efficient non-deterministic search algorithm called Problem-Space Genetic Algorithm to search for the optimum. Experimental results indicate that, according to a proposed quality metric, our approach can attain an average 87% of the optimum for two-way partitioning

KFUPM ePrints

Programming MPSoC platforms: Road works ahead

Author: Bekooij Marco
Domer Rainer
Leupers Rainer
Nohl Achim
Soonhoi Ha
Vajda Andras
Publication venue: IEEE Computer Society Press
Publication date: 01/01/2009
Field of study

This paper summarizes a special session on multicore/multi-processor system-on-chip (MPSoC) programming challenges. The current trend towards MPSoC platforms in most computing domains does not only mean a radical change in computer architecture. Even more important from a SW developer´s viewpoint, at the same time the classical sequential von Neumann programming model needs to be overcome. Efficient utilization of the MPSoC HW resources demands for radically new models and corresponding SW development tools, capable of exploiting the available parallelism and guaranteeing bug-free parallel SW. While several standards are established in the high-performance computing domain (e.g. OpenMP), it is clear that more innovations are required for successful\ud deployment of heterogeneous embedded MPSoC. On the other hand, at least for coming years, the freedom for disruptive programming technologies is limited by the huge amount of certified sequential code that demands for a more pragmatic, gradual tool and code replacement strategy

Publikationsserver der RWTH Aachen University

University of Twente Research Information

Recommended from our members

An intelligent component database for behavioral synthesis

Author: Chen Gwo-Dong
Gajski Daniel D.
Publication venue: eScholarship, University of California
Publication date: 10/11/1989
Field of study

This paper describes an intelligent component database system that delivers components to synthesis tools when given a set of attributes and constraints. Requirements of a component server are defined and an implementation is described. Our experiments demonstrate that such a component sever can replace component libraries and component catalogs with hundreds of pages

eScholarship - University of California

High-level Synthesis for Semi-global Matching: Is the juice worth the squeeze?

Author: Gregoretti Francesco
Lavagno Luciano
Lazarescu MIHAI TEODOR
Muslim FAHAD BIN
Qamar Affaq
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Methodology and optimizing of multiple frame format buffering within FPGA H.264/AVC decoder with FRExt.

Author: Stotts Timothy Aaron
Publication venue: RIT Scholar Works
Publication date: 01/08/2007
Field of study

Digital representation of video data is an inherently resource demanding problem that continues to necessitate the development and refinement of coding methods. The H.264/AVC standard, along with its recent Fidelity Range Extensions amendment (FRExt), is quickly being adopted as the standard codec for broadcast and distribution of high definition video. The FRExt amendment, while not necessarily affecting the overall decoder architecture, presents an added complexity of providing efficient memory management for buffering intermediate frames of various pixel color samplings and depths. This thesis evaluated the role of designing the frame buffer of a hardware video decoder, with integrated support for the H.264/AVC codec plus FRExt. With focus on organizing external memory data access, the frame buffer was designed to provide intermediate data storage for the decoder, while using an efficient store and load scheme that takes into consideration each frame pixel format of the video data. VHDL was used to model the frame buffer. Exploitation of reconfigurability and post-synthesis FPGA simulations were used to evaluate behavior, scalability and power consumption, while providing an analysis of approaches to adding FRExt to the memory management. Real-time buffer performance was achieved for two common frame formats at 1080 HD resolution; and an innovative pipeline design provides dynamic switching of formats between video sequences. As an additional consequence of verifying the model, a preexisting Baseline H.264/AVC decoder testbench was augmented to support testing of multiple frame formats

RIT Scholar Works

System specification and performance analysis

Author: Benders L.P.M.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/1996
Field of study

Repository TU/e

Pure OAI Repository

Automatic mapping of graphical programming applications to microelectronic technologies

Author: Ong Sze-Wei
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/05/2001
Field of study

Adaptive computing systems (ACSs) and application-specific integrated circuits (ASICs) can serve as flexible hardware accelerators for applications in domains such as image processing and digital signal processing. However, the mapping of applications onto ACSs and ASICs using the traditional methods can take months for a hardware engineer to develop and debug. In this dissertation, a new approach for automatic mapping of software applications onto ACSs and ASICs has been developed, implemented and validated. This dissertation presents the design flow of the software environment called CHAMPION, which is being developed at the University of Tennessee. This environment permits high-level design entry using the Cantata graphical programming software fromKRI. Using Cantata as the design entry, CHAMPION hides from the user the low-level details of the hardware architecture and the finer issues of application mapping onto the hardware. Validation of the CHAMPION environment was performed using multiple applications of moderate complexity. In one case, theapplication mapping time which required six weeks to perform manually took only six minutes for CHAMPION, yet comparable results were produced. Furthermore, the CHAMPION environment was constructed such that retargeting to a new adaptive computing system could be accomplished in just a few hours as opposed to weeks using manual methods. Thus, CHAMPION permits both ACSs and ASICs to be utilized by a wider audience and application development accomplished in less time

University of Tennessee, Knoxville: Trace

Framework for simulation of fault tolerant heterogeneous multiprocessor system-on-chip

Author: Tafesse Bisrat
Publication venue: Digital Scholarship@UNLV
Publication date: 01/01/2008
Field of study

Due to the ever growing requirement in high performance data computation, current Uniprocessor systems fall short of hand to meet critical real-time performance demands in (i) high throughput (ii) faster processing time (iii) low power consumption (iv) design cost and time-to-market factors and more importantly (v) fault tolerant processing. Shifting the design trend to MPSOCs is a work-around to meet these challenges. However, developing efficient fault tolerant task scheduling and mapping techniques requires optimized algorithms that consider the various scenarios in Multiprocessor environments. Several works have been done in the past few years which proposed simulation based frameworks for scheduling and mapping strategies that considered homogenous systems and error avoidance techniques. However, most of these works inadequately describe today\u27s MPSOC trend because they were focused on the network domain and didn\u27t consider heterogeneous systems with fault tolerant capabilities; In order to address these issues, this work proposes (i) a performance driven scheduling algorithm (PD SA) based on simulated annealing technique (ii) an optimized Homogenous-Workload-Distribution (HWD) Multiprocessor task mapping algorithm which considers the dynamic workload on processors and (iii) a dynamic Fault Tolerant (FT) scheduling/mapping algorithm to employ robust application processing system. The implementation was accompanied by a heterogeneous Multiprocessor system simulation framework developed in systemC/C++. The proposed framework reads user data, set the architecture, execute input task graph and finally generate performance variables. This framework alleviates previous work issues with respect to (i) architectural flexibility in number-of-processors, processor types and topology (ii) optimized scheduling and mapping strategies and (iii) fault-tolerant processing capability focusing more on the computational domain; A set of random as well as application specific STG benchmark suites were run on the simulator to evaluate and verify the performance of the proposed algorithms. The simulations were carried out for (i) scheduling policy evaluation (ii) fault tolerant evaluation (iii) topology evaluation (iv) Number of processor evaluation (v) Mapping policy evaluation and (vi) Processor Type evaluation. The results showed that PD scheduling algorithm showed marginally better performance than EDF with respect to utilization, Execution-Time and Power factors. The dynamic Fault Tolerant implementation showed to be a viable and efficient strategy to meet real-time constraints without posing significant system performance degradation. Torus topology gave better performance than Tile with respect to task completion time and power factors. Executing highly heterogeneous Tasks showed higher power consumption and execution time. Finally, increasing the number of processors showed a decrease in average Utilization but improved task completion time and power consumption; Based on the simulation results, the system designer can compare tradeoffs between a various design choices with respect to the performance requirement specifications. In general, designing an optimized Multiprocessor scheduling and mapping strategy with added fault tolerant capability will enable to develop efficient Multiprocessor systems which meet future performance goal requirements. This is the substance of this work

University of Nevada, Las Vegas Repository

High-speed dynamic partial reconfiguration for field programmable gate arrays

Author: Hoffman John
Publication venue: UNM Digital Repository
Publication date: 27/08/2009
Field of study

With dynamically and partially reconfigurable designs, it is necessary that the speed of the reconfiguration be accomplished in a time that is sufficiently small such that the operation of reconfiguration is not the limiting factor in the process. Therefore, the communication between the source of configuration and the configurable unit must be made as fast as possible. The aim of this work is to use an embedded controller internal to the FPGA to control the reconfiguration process and obtain the maximum speed at which reconfiguration can occur, with current FPGA technology. The use of Direct Memory Access (DMA) driven operations instead of the current arbitrated bus architectures yielded a 30% increase in the speed of reconfiguration compared to other methods such as OPB_HWICAP and PLB_HWICAP [1]. The use of interrupt driven partial reconfiguration was also introduced, allowing the processor to switch to other tasks during the reconfiguration operation. All of these contributions lead to significant performance improvements over current partial reconfiguration subsystems. The configuration controller was tested using four partially reconfigurable system implementations: (i) one targeting the Hard IP PowerPC405 on Virtex-4, (ii) a second targeting the Soft IP MicroBlaze on Virtex-5, (iii) a third targeting the Hard IP PowerPC440 on Virtex-5, and (iv) a fourth system targets the Hard IP PowerPC440 on Virtex-5 capable of adaptive feedback. The adaptive feedback Virtex-5 system can use internal voltage and temperature measurements from the Xilinx System Monitor IP to dynamically increase or decrease the speed of reconfiguration and/or change other reconfigurable aspects of the system to better match the environment