22 research outputs found

    Interaktives Lernen mit vollständig digital arbeitenden Multimedia-Systemen

    No full text

    Programming VLIW architectures with super operations

    No full text
    The length of a statically created instruction schedule determines to a great extent the performance of program executions on VLIW architectures. In this paper we present a simple, yet effective, method to reduce the length of a static instruction schedule by introducing new hardware operations, referred to as super operations. A super operation replaces a number of operations, while maintaining functionality, hence decreasing the total number of operations to be executed and thereby eliminating the dependencies between them. In order to replace a number of operations, super operations must often process more operands and produce more results than traditional operations. The Philips TM-1000 is a VLIW based architecture. Its CPU is a 5-issue machine with 27 functional units, each connected to one issue-slot. To support super operations, we extend the hardware with special functional units which are connected to more than one issue-slot. In this paper we discuss the modifications that were made to the compiler in order to support super operations and we demonstrate the ease with which super operations can be applied by the application programmer. To a lesser extent, we address consequences of super operations concerning the hardware. Furthermore, we demonstrate the benefit of super operations by showing the performance improvement for some multimedia applications

    Simba

    No full text

    TriMedia CPU64 Architecture

    No full text
    We present a new VLIW core as a successor to the TriMedia TM1000. The processor is targeted for embedded use in media-processing devices like DTVs and set-top boxes. Intended as a core, its design must be supplemented with on-chip co-processors to obtain a cost-effective system. Good performance is obtained through a uniform 64-bit 5 issue-slot VLIW design, supporting subword parallelism with an extensive instruction set optimized with respect to media-processing. Multi-slot `super-ops' allow powerful multi-argument and multi-result operations. As an example, an IDCT algorithm shows a very low instruction count in comparison with other processors. To achieve good performance, critical sections in the application program source code need to be rewritten with vector data types and function calls for media operations. Benchmarking with several media applications was used to tune the instruction set and study cache behavior. This resulted in a VLIW architecture with wide data paths and relatively simple cpu control. 1
    corecore