669 research outputs found

    Automatic synthesis of application-specific processors

    Get PDF
    Thesis (D. Tech. (Engineering: Electrical)) -- Central University of technology, Free State, 2012This thesis describes a method for the automatic generation of appli- cation speci_c processors. The thesis was organized into three sepa- rate but interrelated studies, which together provide: a justi_cation for the method used, a theory that supports the method, and a soft- ware application that realizes the method. The _rst study looked at how modern day microprocessors utilize their hardware resources and it proposed a metric, called core density, for measuring the utilization rate. The core density is a function of the microprocessor's instruction set and the application scheduled to run on that microprocessor. This study concluded that modern day microprocessors use their resources very ine_ciently and proposed the use of subset processors to exe- cute the same applications more e_ciently. The second study sought to provide a theoretical framework for the use of subset processors by developing a generic formal model of computer architecture. To demonstrate the model's versatility, it was used to describe a number of computer architecture components and entire computing systems. The third study describes the development of a set of software tools that enable the automatic generation of application speci_c proces- sors. The FiT toolkit automatically generates a unique Hardware Description Language (HDL) description of a processor based on an application binary _le and a parameterizable template of a generic mi- croprocessor. Area-optimized and performance-optimized custom soft processors were generated using the FiT toolkit and the utilization of the hardware resources by the custom soft processors was character- ized. The FiT toolkit was combined with an ANSI C compiler and a third-party tool for programming _eld-programmable gate arrays (FPGAs) to create an unconstrained C-to-silicon compiler

    A Study on HDL Generation for Application-specific Processors

    Get PDF

    A Cosynthesis Algorithm for Application Specific Processors with Heterogeneous Datapaths

    Get PDF

    Semi-Automatic Optimization Using Specialized Instructions

    Get PDF
    Návrh instrukční sady aplikačně specifických procesorů je náročná úloha. Tato práce popisuje problematiku výběru, označení a vytvoření instrukčních rozšíření aplikačně specifických procesorů. Použitá semiautomatická metoda umožňuje uživateli snadný výběr instrukčních rozšíření pomocí označení úseku zdrojového kódu aplikace. Samotné vytvoření nové instrukce v modelovacím jazyku je řešen automaticky. Tím nechá uživatele soustředit se na činnost, při které se nejvíce uplatní vynalézavost a zkušenosti člověka.The design of instruction sets for application specific processors is a difficult task. This thesis describes the tasks of selection, marking and creation of instruction set extensions for application specific processors. The presented semiautomatic method provides the user with a simple way to select instruction set extensions by marking a section of source code in the application. The creation of the new instruction in the modelling language itself is solved automatically. Thanks to this the user can concentrate his efforts on tasks where human ingenuity and experience can be used the most.

    Применение рациональных дробей в специализированных вычислителях

    Get PDF
    An untraditional rational factor data representation in the application specific processors is considered. This data representation provides high computational precision and helps to do without floating point numbers.В статье рассмотрено нетрадиционное представление данных в виде рациональных дробей, которое позволяет, не применяя чисел с плавающей запятой, выполнять вычисления с повышенной точностью

    OpenCL-based design methodology for application-specific processors

    Get PDF
    Abstract-OpenCL is a programming language standard which enables the programmer to express the application by structuring its computation as kernels. The OpenCL compiler is given the explicit freedom to parallelize the execution of kernel instances at all the levels of parallelism. In comparison to the traditional C programming language which is sequential in nature, OpenCL enables higher utilization of parallelism naturally available in hardware constructs while still having a feasible learning curve for engineers familiar with the C language. This paper describes methodology and compiler techniques involved in applying OpenCL as an input language for a design flow of application-specific processors. At the core of the methodology is a whole program optimizing compiler that links together the host and kernel codes of the input OpenCL program and parallelizes the result on a customized statically scheduled processor. The OpenCL vendor extension mechanism is used to provide clean access to custom operations. The methodology is studied with a design case to verify the scalability of the implementation at the instruction level and to exemplify the use of custom operations. The case shows that the use of OpenCL allows producing scalable application-specific processor designs and makes it possible to gradually reach the performance of hand-tailored RTL designs by exploiting the OpenCL extension mechanism to access custom hardware operations of varying complexity

    Hardware Reuse in Modern Application-specific Processors and Accelerators

    Get PDF
    Abstract-Effective exploitation of the application-specific parallel patterns and computation operations through their direct implementation in hardware is the base for construction of highquality application-specific (re-)configurable application specific instruction set processors (ASIPs) and hardware accelerators for modern highly-demanding applications. Although it receives a lot of attention from the researchers and practitioners, a very important problem of hardware reuse in ASIP and accelerator synthesis is clearly underestimated and does not get enough attention in the published research. This paper is an effect of an industry and academic collaborative research. It analyses the problem of hardware sharing, shows its high practical relevance, as well as a big influence of hardware sharing on the major circuit and system parameters, and its importance for the multi-objective optimization and tradeoff exploitation. It also demonstrates that the state-of-the-art synthesis tools do not sufficiently address this problem and gives several guidelines related to enhancement of the hardware reuse

    Fast Fourier transforms on energy-efficient application-specific processors

    Get PDF
    Many of the current applications used in battery powered devices are from digital signal processing, telecommunication, and multimedia domains. Traditionally application-specific fixed-function circuits have been used in these designs in form of application-specific integrated circuits (ASIC) to reach the required performance and energy-efficiency. The complexity of these applications has increased over the years, thus the design complexity has increased even faster, which implies increased design time. At the same time, there are more and more standards to be supported, thus using optimised fixed-function implementations for all the functions in all the standards is impractical. The non-recurring engineering costs for integrated circuits have also increased significantly, so manufacturers can only afford fewer chip iterations. Although tailoring the circuit for a specific application provides the best performance and/or energy-efficiency, such approach lacks flexibility. E.g., if an error is found after the manufacturing, an expensive chip iteration is required. In addition, new functionalities cannot be added afterwards to support evolution of standards. Flexibility can be obtained with software based implementation technologies. Unfortunately, general-purpose processors do not provide the energy-efficiency of the fixed-function circuit designs. A useful trade-off between flexibility and performance is implementation based on application-specific processors (ASP) where programmability provides the flexibility and computational resources customised for the given application provide the performance. In this Thesis, application-specific processors are considered by using fast Fourier transform as the representative algorithm. The architectural template used here is transport triggered architecture (TTA) which resembles very long instruction word machines but the operand execution resembles data flow machines rather than traditional operand triggering. The developed TTA processors exploit inherent parallelism of the application. In addition, several characteristics of the application have been identified and those are exploited by developing customised functional units for speeding up the execution. Several customisations are proposed for the data path of the processor but it is also important to match the memory bandwidth to the computation speed. This calls for a memory organisation supporting parallel memory accesses. The proposed optimisations have been used to improve the energy-efficiency of the processor and experiments show that a programmable solution can have energy-efficiency comparable to fixed-function ASIC designs

    Fast and Partially Translated Simulator for Application-Specific Processors

    Get PDF
    Hlavným cieľom tejto práce je analyzovať možnosti využitia simulácie pri návrhu aplikačne špecifických procesorov, preskúmať a porovnať rôzne simulačné techniky a využiť získané poznatky pri návrhu nového simulačného nástroja použiteľného pri vývoji a optimalizácii procesorov. Táto práca prezentuje hlavné požiadavky na nový simulátor a popisuje návrh a implementáciu jeho kľúčových častí s dôrazom na dosiahnutie čo najvyššieho výkonu.The major objective of this work is to analyse possibilities of using simulation within the development of application-specific instruction-set processors, to explore and compare some common simulation techniques and to use the collected information to design a new simulation tool suitable for utilization in the processors development and optimization. This thesis presents the main requirements on the new simulator and describes the design and implementation of its key parts with emphasis on the high performance.

    Chameleon C2HDL Design Tool In Self-Configurable Ultrascale Computer Systems Based On Partially Reconfigurable FPGAs

    Get PDF
    Proceedings of: Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015). Krakow (Poland), September 10-11, 2015.The FPGA-based accelerators and reconfigurable computer systems based on them require designing the applicationspecific processors soft-cores and are effective for certain classes of problems only, for which these soft-cores were previously developed. In Self-Configurable FPGA-based Computer Systems the challenge of designing the application-specific processors soft-cores is solved with use of the C2HDL tools, allowing them to be generated automatically. In this paper, we study the questions of the self-configurable computer systems efficiency increasing with use of the partially reconfigurable FPGAs and Chameleonc C2HDL design tool, corresponding to the goals of the project entitled "Improvement of heterogeneous systems efficiency using self-configurable FPGA-based computing" which is a part of the NESUS action. One of the features of the Chameleonc C2HDL design tool is its ability to generate a number of application-specific processors soft-cores executing the same algorithm that differ by the amount of FPGA resources required for their implementation. If the self-configurable computer systems are based on partially reconfigurable FPGAs, this feature allows them to acquire in every moment of its operation such a configuration that will provide an optimal use of its reconfigurable logic at a given level of hardware multitasking
    corecore