Search CORE

27 research outputs found

Execution-time Prediction for Dynamic Streaming Applications with Task-level Parallelism

Author: Basten AA Twan
Meerbergen J Jef van
Poplavko P Petro
Publication venue: IEEE Computer Society
Publication date: 01/01/2007
Field of study

Programmable multiprocessor systems-on-chip are becoming the preferred implementation platform for embedded streaming applications. This enables using more software components, which leads to large and frequent dynamic variations of data-dependent execution times. In this context, accurate and conservative prediction of execution times helps in maintaining good audio/video quality and reducing energy consumption by dynamic evaluation of the amount of on-chip resources needed by applications. To be effective, multiprocessor systems have to employ the available parallelism. The combination of task-level parallelism and task delay variations makes predicting execution times a very hard problem. So far, under these conditions, no appropriate techniques exist for the conservative prediction of execution times with the required accuracy. In this paper, we present a novel technique for this problem, exploiting the concept of scenario-based prediction, and taking into account the transient and periodic behavior of scenarios and the effect of scenario transitions. In our MPEG-4 shape-decoder case study, we observe no more than 11% average overestimation

Repository TU/e

Heterogeneous multiprocessor for the management of real-time video and graphics streams

Author: Meerbergen J Jef van
Roostelaar GJ van
Strik MTJ Marino
Timmer AH Adwin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2000
Field of study

This paper presents an application domain driven approach to the design of embedded systems on silicon, and it shows how this approach is used to design a chip for a multi-window TV application. We discuss all major design steps in a logical order starting with an application domain analysis. This leads to the choice of Kahn data flow graphs as the programming paradigm for high-throughput signal applications. Based on this analysis we designed a multiprocessor architecture which uses a run-time reconfiguration. Finally, attention is directed towards the physical implementation and the deep-submicron problems we had to solve. The result is a chip that can manage up to 25 internal real-time video streams. The chip combines the flexibility of a programmable solution with the cost effectiveness of a consumer produc

Repository TU/e

Pure OAI Repository

A scalable implementation of a reconfigurable WCDMA rake receiver

Author: Fifueras J.
Gerousis V.
Huisken Jos
Lindwer M.
Quax Marc
van Meerbergen Jef
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2004
Field of study

The demands in terms of processing performance, communication bandwidth and real-time throughput of new generation mobile communication applications (mobile and base-stations) are much higher than today's programmable processing architectures can deliver. On the other hand standards and market uncertainties, non-recurring, engineering costs, and lack of access to (or knowledge of) application IP will require the next generation of embedded computing platforms to be fully programmable. In terms of silicon cost and power, practical yet fully programmable embedded computing platforms are enabled by reconfigurable processors that replace fixed ASICs in current standard platforms. This paper explains the concepts behind a novel reconfigurable WCDMA Rake receiver and gives benchmark results. The proposed Rake receiver enables a high performance, yet flexible computing platform for WCDMA

Constraint analysis for DSP code generation.

Author: Jess JAG Jochen
Meerbergen J Jef van
Mesman B Bart
Timmer AH Adwin
Publication venue
Publication date: 01/01/2001
Field of study

Repository TU/e

Pure OAI Repository

Constraint analysis for DSP code generation

Author: Jess JAG Jochen
Meerbergen J Jef van
Mesman B Bart
Timmer AH Adwin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1999
Field of study

Code generation methods for digital signal processing (DSP) applications are hampered by the combination of tight timing constraints imposed by the performance requirements of DSP algorithms and resource constraints imposed by a hardware architecture. In this paper, we present a method for register binding and instruction scheduling based on the exploitation and analysis of the combination of resource and timing constraints. The analysis identifies implicit sequencing relations between operations in addition to the preceding constraints. Without the explicit modeling of these sequencing constraints, a scheduler is often not capable of finding a solution that satisfies the timing and resource constraints. The presented approach results in an efficient method to obtain high-quality instruction schedules with low register requirement

Repository TU/e

Pure OAI Repository

On resource estimation of MPEG-4 video decoding for a multiprocessor architecture.

Author: Meerbergen J Jef van
Pastrnak M Milan
Poplavko P Petro
With PHN Peter de
Publication venue: STW Technology Foundation
Publication date: 01/01/2003
Field of study

This paper addresses an efficient implementation of new emerging video algorithms like the coding of arbitrarily shaped video objects in the new MPEG-4 standard. This type of advanced multimedia applications pose challenging requirements on embedded systems design with respect to decomposition and scalability, in order to meet real-time constraints. We study the design of networks-on-chip (NoC), which intrinsically satisfies these requirements [5]. A job scheduler needs to know the worst-case execution time (WCET) of a starting job to ensure that the job can meet its timing constraints. For the purpose of timing analysis, such as computing the WCET, a timing model has been applied which has a linear dependence on a set of inputdependent data parameters. We derive a linear timing model for MPEG-4 video object decoding from a running executable specification. Our timing model is computed and verified with an instruction-set simulator of a RISC processor element containing a flat local memory model. The derived model is accurate within 6% for the average execution time

Repository TU/e

Pure OAI Repository

Parallel implementation of arbitrary-shaped MPEG-4 decoder for multiprocessor systems

Author: Meerbergen J Jef van
Pastrnak M Milan
Stuijk Sander Sander
With PHN Peter de
Publication venue: 'Instytut Dermatologii Radoslaw Spiewak'
Publication date: 01/01/2006
Field of study

MPEG-4 is the first standard that combines synthetic objects, like 2D/3D graphics objects, with natural rectangular and non-rectangular video objects. The independent access to individual synthetic video objects for further manipulation creates a large space for future applications. This paper addresses the optimization of such complex multimedia algorithms for implementation on multiprocessor platforms. It is shown that when choosing the correct granularity of processing for enhanced parallelism and splitting time-critical tasks, a substantial improvement in processing efficiency can be obtained. In our work, we focus on non-rectangular (also called arbitrary-shaped) video objects decoder. In previous work, we motivated the use of a multiprocessor System-on-Chip(SoC) setup that satisfies the requirements on the overall computation capacity. We propose the optimization of the MPEG-4 algorithm to increase the decoding throughput and a more efficient usage of the multiprocessor architecture. First, we present a modification of the Repetitive Padding to increase the pipelining at block level. We identified the part of the padding algorithm that can be executed in parallel with the DCT-coefficient decoding and modified the original algorithm into two communicating tasks. Second, we introduce a synchronization mechanism that allows the processing for the Extended Padding and postprocessing (Deblocking & Deringing) filters at block level. The first optimization results in about 58% decrease of the original Repetitive-Padding task computational requirements. By introducing the previously proposed data-level parallelism and exploiting the inherent parallelism between the separated color components (Y, Cr, Cb), the computational savings are about 72% on the average. Moreover, the proposed optimizations marginalize the processing latency from frame size to slice order-of-magnitude

Repository TU/e

Pure OAI Repository

Efficient timing constraint derivation for optimally retiming high speed processing units

Author: Aarts EHL Emile
Lippens PER Paul
Meerbergen J Jef van
Verhaegh WFJ Wim
Werf A van der
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1994
Field of study

Retiming, including pipelining, is applied to make the processing units (PUs) run at a required throughput rate with a minimum number of registers. In the first step, a timing analysis of a PU is performed which results in inequality constraints on the operations' retimings. The constraints, together with a cost function expressing the number of registers in a retimed PU, form an instance of an integer linear programming problem, which is solved to optimality in the second step. In this paper, we concentrate on the constraint derivation task. We present two new constraint derivation algorithms, one of which is more memory efficient and the other more run-time efficient. We show that the run-time efficient algorithm makes it possible to minimize the area of a huge standard cell network, possibly representing a complete IC, within acceptable run-time limits

Repository TU/e

Pure OAI Repository