367 research outputs found

    Huffman-based Code Compression Techniques for Embedded Systems

    Get PDF

    Compiler optimization and ordering effects on VLIW code compression

    Get PDF

    Compiler optimization and ordering effects on VLIW code compression

    Get PDF
    Code size has always been an important issue for all embedded applications as well as larger systems. Code compression techniques have been devised as a way of battling bloated code; however, the impact of VLIW compiler methods and outputs on these compression schemes has not been thoroughly investigated. This paper describes the application of single- and multipleinstruction dictionary methods for code compression to decrease overall code size for the TI TMS320C6xxx DSP family. The compression scheme is applied to benchmarks taken from the Mediabench benchmark suite built with differing compiler optimization parameters. In the single instruction encoding scheme, it was found that compression ratios were not a useful indicator of the best overall code size – the best results (smallest overall code size) were obtained when the compression scheme was applied to sizeoptimized code. In the multiple instruction encoding scheme, changing parallel instruction order was found to only slightly improve compression in unoptimized code and does not affect the code compression when it is applied to builds already optimized for size

    Design and Simulation of High Performance Parallel Architectures Using the ISAC Language

    Get PDF
    Most of modern embedded systems for multimediaand network applications are based on parallel data streamprocessing. The data processing can be done using very longinstruction word processors (VLIW), or using more than onehigh performance application-specific instruction set processor(ASIPs), or even by their combination on single chip.Design and testing of these complex systems is time-consumingand iterative process. Architecture description languages (ADLs)are one of the most effective solutions for single processor design.However, support for description of parallel architectures andmulti-processor systems is very low or completely missing innowadays ADLs. This article presents utilization of newextensions for existing architecture description language ISAC.These extensions are used for easy and fast prototyping andtesting of parallel based systems and processors

    VLSI architecture design approaches for real-time video processing

    Get PDF
    This paper discusses the programmable and dedicated approaches for real-time video processing applications. Various VLSI architecture including the design examples of both approaches are reviewed. Finally, discussions of several practical designs in real-time video processing applications are then considered in VLSI architectures to provide significant guidelines to VLSI designers for any further real-time video processing design works

    Multimedia terminal system-on-chip design and simulation

    Get PDF
    This paper proposes a design approach based on integrated architectural and system-on-chip (SoC) simulations. The main idea is to have an efficient framework for the design and the evaluation of multimedia terminals, allowing a fast system simulation with a definable degree of accuracy. The design approach includes the simulation of very long instruction word (VLIW) digital signal processors (DSPs), the utilization of a device multiplexing the media streams, and the emulation of the real-time media acquisition. This methodology allows the evaluation of both the multimedia algorithm implementations and the hardware platform, giving feedback on the complete SoC including the interaction between modules and conflicts in accessing either the bus or shared resources. An instruction set architecture (ISA) simulator and an SoC simulation environment compose the integrated framework. In order to validate this approach, the evaluation of an audio-video multiprocessor terminal is presented, and the complete simulation test results are reported

    Performance and area evaluations of processor-based benchmarks on FPGA devices

    Get PDF
    The computing system on SoCs is being long-term research since the FPGA technology has emerged due to its personality of re-programmable fabric, reconfigurable computing, and fast development time to market. During the last decade, uni-processor in a SoC is no longer to deal with the high growing market for complex applications such as Mobile Phones audio and video encoding, image and network processing. Due to the number of transistors on a silicon wafer is increasing, the recent FPGAs or embedded systems are advancing toward multi-processor-based design to meet tremendous performance and benefit this kind of systems are possible. Therefore, is an upcoming age of the MPSoC. In addition, most of the embedded processors are soft-cores, because they are flexible and reconfigurable for specific software functions and easy to build homogenous multi-processor systems for parallel programming. Moreover, behavioural synthesis tools are becoming a lot more powerful and enable to create datapath of logic units from high-level algorithms such as C to HDL and available for partitioning a HW/SW concurrent methodology. A range of embedded processors is able to implement on a FPGA-based prototyping to integrate the CPUs on a programmable device. This research is, firstly represent different types of computer architectures in modern embedded processors that are followed in different type of software applications (eg. Multi-threading Operations or Complex Functions) on FPGA-based SoCs; and secondly investigate their capability by executing a wide-range of multimedia software codes (Integer-algometric only) in different models of the processor-systems (uni-processor or multi-processor or Co-design), and finally compare those results in terms of the benchmarks and resource utilizations within FPGAs. All the examined programs were written in standard C and executed in a variety numbers of soft-core processors or hardware units to obtain the execution times. However, the number of processors and their customizable configuration or hardware datapath being generated are limited by a target FPGA resource, and designers need to understand the FPGA-based tradeoffs that have been considered - Speed versus Area. For this experimental purpose, I defined benchmarks into DLP / HLS catalogues, which are "data" and "function" intensive respectively. The programs of DLP will be executed in LEON3 MP and LE1 CMP multi-processor systems and the programs of HLS in the LegUp Co-design system on target FPGAs. In preliminary, the performance of the soft-core processors will be examined by executing all the benchmarks. The whole story of this thesis work centres on the issue of the execute times or the speed-up and area breakdown on FPGA devices in terms of different programs

    Siirtoliipaisuarkkitehtuurin muuttuvanmittaisten käskyjen pakkaus

    Get PDF
    The Static Random-Access Memory (SRAM) modules used for embedded microprocessor devices consume a large portion of the whole system’s power. The memory module consumes static power on keeping awake and dynamic power on memory accesses. The power dissipation of the instruction memory can be limited by using code compression methods, which reduce the memory size. The compression may require the use of variable length instruction formats in the processor. The power-efficient design of variable length instruction fetch and decode units is challenging for static multiple-issue processors, because such architectures have simple hardware to begin with, as they aim for very low power consumption on embedded platforms. The power saved by using these compression approaches, which necessitate more complex logic, is easily lost on inefficient processor design. This thesis proposes an implementation for instruction template-based compression, its decompression and two instruction fetch design alternatives for variable length instruction encoding on Transport Triggered Architecture (TTA), a static multiple-issue exposed data path architecture. Both of the new fetch and decode units are integrated into the TTA-based Co-design Environment (TCE), which is a toolset for rapid designing and prototyping of processors based on TTA. The hardware description of the fetch units is verified on a register transfer level and benchmarked using the CHStone test suite. Furthermore, the fetch units are synthesized on a 40 nm standard cell Application Specific Integrated Circuit (ASIC) technology library for area, performance and power consumption measurements. The power cost of the variable length instruction support is compared to the power savings from memory reduction, which is evaluated using HP Labs’ CACTI tool. The compression approach reaches an average program size reduction of 44% at best with a set of test programs, and the total power consumption of the system is reduced. The thesis shows that the proposed variable length fetch designs are sufficiently low-power oriented for TTA processors to benefit from the code compression

    Coarse-grained reconfigurable array architectures

    Get PDF
    Coarse-Grained Reconfigurable Array (CGRA) architectures accelerate the same inner loops that benefit from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efficiently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on flexibility, performance, and power-efficiency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual fine-tuning of source code
    corecore