Abstract-This paper presents a generic video processing platform architecture enabling HW-SW partitioning with high flexibility and programmability. This modular concept is suitable for a large range of applications and features and is used for implementation of a TV platform. A key feature is to create S W programmability for video signal manipulation at the functional and system levels and give cost-effective H W support far pixel-based processing.
I. INTRODUCTION
The VLSI design complexity and the corresponding software for implementing multimedia architectures including advanced video processing require an increasingly large effort. Chips for consumer-electronics products are currently being designed containing up to 10 Million gates, having a computing power in the order of 10 BOPS (Billion Operations Per Second) [l] and the corresponding embedded software approaches 1 MByte for TVs. In order to keep track with this complexity growth, reusability of subsystems, both in software and hardware, is indispensable. In the past, several architectures for TV applications have been proposed [2][3], hut they do not offer sufficient flexibility for future TV systems. In this paper we propose a modular and extensible video processing system, enabling the aforementioned properties. We present the total system architecture, based on two chips. Three aspects are discussed in more detail: Signal Computing, Memory, and Communication. In 171, examples for TV applications for the first realization of this system are presented.
SIGNAL COMPUTING
The proposed architecture template should enable flexible codesign of the hardware and software. The video signals are organized as tasks of variable length (see Section V). High-speed signal computing is performed in autonomous coprocessing units. Low-speed computing and high-level programming of the coprocessor hardware is carried out hy a CPU. Preferably, the CPU is powerful enough for executing several tasks in software, besides providing overall system control. However, the CPU is not used for synchronization or scheduling of the hardware, in order to avoid inefficient use of execution time and complex software stacks. The coprocessors execute typical video functions which are required in any case in a video system, such as video scaling, pixel-based sharpness enhancement, noise reduction, field-rate conversion, and so on. For this r e a son, such functions are most efficiently implemented in hardware. Figure 1 portrays a diagram of the video platform hardware and Figure 2 gives the layered software model.
MEMORY
To create a cost-effective solution, one uniform hackground memory is used, based on a standard off-theshelf RAM. The RAM can be accessed by both the CPU and the coprocessors. One unified memory map is applied to enable a generic communication protocol between CPU and surrounding modules. To avoid excessive memory access, the microcontroller and the coprocessors have small local cache memories, e.g. a ver- tical filter coprocessor may have line memories locally. For a chosen application area, memory bandwidth allocation should be analyzed to ensure sufficient memory capacity for each coprocessor.
IV. COMMUNICATION
The coprocessors are interconnected via a programmable switch-bar for independent communication of video signals and graphics. This allows much more bandwidth than a conventional bus approach and it enables parallel computing with individual coprocessors. Control and programming of the coprocessors is realized by using a sep- This protocol is an open standard to facilitate the exchange of design modules, thereby enabling the creation of module libraries. It provides on-chip communication between components which are typically processor cores, memories, 110 interfaces, peripherals (UARTs, timers) and application-specific functions (e.g. an MPEG decoder in a set-top box). Direct Memory Access is used to create a general and hardwareindependent communication between the CPU and the coprocessors. This is separated into two parts.
V. TASK-ORIENTED PROCESSING
The use of various video windows having a different size on the display, leads to variable processing requirements. For this reason, sedoble video computing is pursued, instead of processing of continuous fixedformat video streams. For example, a Pip processing task is smaller than a full-sized image background. Therefore, tasks of variable length corresponding with the picture size, are distinguished. If tasks are small, computing power can be assigned to another task, so that the hardware modules can be re-used. As a result, a coprocessor should be able to execute several tasks of variable length. Communication between two coprocessors is performed via small buffers. Processing of video data is enabled as long as data is available and/or can be stored at the subsequent coprocessor.
This model, called dynamic dataflow, can be operated under well-defined conditions [8] in order to ensure a flawless execution of tasks. [7] . A second application is found in set-top boxes, where some modules can be omitted (e.g. noise reduction), and replaced by MPEG decoding modules, such as a variable-length decoder and an inverse DCT. The architectural frame-work can be re-used, together with a set of coprocessors, the control SW and the pixelbased graphics.
