3 research outputs found

    Compression for reduction of off-chip video bandwidth

    No full text
    The architecture for block-based video applications (e.g. MPEG/JPEG coding, graphics rendering) is usually based on a processor engine, connected to an external background SDRAM memory where reference images and data are stored. In this paper, we reduce the required memory bandwidth for MPEG coding up to 67% by identifying the optimal block configuration and applying embedded data compression up to a factor four. It is shown that independent compression of fixed-sized data blocks with a fixed compression ratio can decrease the memory bandwidth for a limited set of compression factors only. To achieve this result, we exploit the statistical properties of the burst-oriented data exchange to memory. It has been found that embedded compression is particularly attractive for bandwidth reduction when a compression ratio 2 or 4 is chosen. This moderate compression factor can be obtained with a low-cost compression scheme such as DPCM with a small acceptable loss of quality

    On the design of multimedia software and future system architectures

    No full text
    A principal challenge for reducing the cost for designing complex systems-on-chip is to pursue more generic systems for a broad range of products. For this purpose, we explore three new architectural concepts for state-of-art video applications. First, we discuss a reusable scalable hardware architecture employing a hierarchical communication network fitting with the natural hierarchy of the application. In a case study, we show that MPEG streaming in DTV occurs at high level, while subsystems communicate at lower levels. The second concept is a software design that scales over a number of processors to enable reuse over a range of VLSI process technologies. We explore this via an H.264 decoder implementation scaling nearly linearly over up to eight processors by applying data partitioning. The third topic is resource-scalability, which is required to satisfy realtime constraints in a system with a high amount of shared resources. An example complexity-scalable MPEG-2 coder scales the required cycle budget with a factor of three, in parallel with a smooth degradation of quality

    Programming VLIW architectures with super operations

    No full text
    The length of a statically created instruction schedule determines to a great extent the performance of program executions on VLIW architectures. In this paper we present a simple, yet effective, method to reduce the length of a static instruction schedule by introducing new hardware operations, referred to as super operations. A super operation replaces a number of operations, while maintaining functionality, hence decreasing the total number of operations to be executed and thereby eliminating the dependencies between them. In order to replace a number of operations, super operations must often process more operands and produce more results than traditional operations. The Philips TM-1000 is a VLIW based architecture. Its CPU is a 5-issue machine with 27 functional units, each connected to one issue-slot. To support super operations, we extend the hardware with special functional units which are connected to more than one issue-slot. In this paper we discuss the modifications that were made to the compiler in order to support super operations and we demonstrate the ease with which super operations can be applied by the application programmer. To a lesser extent, we address consequences of super operations concerning the hardware. Furthermore, we demonstrate the benefit of super operations by showing the performance improvement for some multimedia applications
    corecore