13 research outputs found

    Modulo Scheduling for the TMS320C6x VLIW DSP Architecture

    No full text
    Digital Signal Processing (DSP) architectures are specialized for high performance numerical algorithms such as those found in communication and multi-media applications. The advent of efficient compilers for DSP processors is a recent development and a growing research area. The Texas Instruments TMS320C6x (C6x) is a Very Long Instruction Word (VLIW) DSP architecture capable of issuing eight instructions in parallel. In this paper we will present the results of implementing a software pipelining algorithm for the C6x. We will provide a description of the C6x and detail the architectural features that impact software pipelining such as a moderately sized register file, constraints on code size, and multiple resource choices for some operations. We discuss why we chose modulo scheduling to implement software pipelining, how we adapted the algorithm to the C6x, and the improvements we made to enable the algorithm to search harder for better schedule solutions. We present the results of m..

    Co-design of Compiler and Hardware Techniques to Reduce Program Code Size on a VLIW Processor

    No full text
    Code size is a primary concern in the embedded computing community. Minimizing physical memory requirements reduces total system cost and improves performance and power efficiency. VLIW processors rely on the compiler to statically encode the ILP in the program before its execution, and because of this, code size is larger relative to other processors. In this paper we describe the co-design of compiler optimizations and processor architecture features that have progressively reduced code size across three generations of a VLIW processor

    Implementation and optimization of the OpenMP accelerator model for the TI Keystone II architecture

    No full text
    The TI Keystone II architecture provides a unique combination of ARM Cortex-A15 processors with high performance TI C66x floating-point DSPs on a single low-power System-on-chip (SoC). Commercially available systems such as the HP Proliant m800 and nCor

    OpenMP on the Low-Power TI Keystone II ARM/DSP System-on-Chip

    No full text
    The Texas Instrument (TI) Keystone II architecture integrates an octa-core C66X DSP with a quad-core ARM Cortex A15 MPCore processor in a non-cache coherent shared memory environment. This System-on-a-Chip (SoC) offers very high Floating Point Operation

    Using OpenMP: the next step : affinity, accelerators, tasking, and SIMD

    No full text
    A guide to the most recent, advanced features of the widely used OpenMP parallel programming model, with coverage of major features in OpenMP 4.5

    Modulo scheduling without overlapped lifetimes

    No full text
    corecore