758 research outputs found

    Design and Test Space Exploration of Transport-Triggered Architectures

    Get PDF
    This paper describes a new approach in the high level design and test of transport-triggered architectures (TTA), a special type of application specific instruction processors (ASIP). The proposed method introduces the test as an additional constraint, besides throughput and circuit area. The method, that calculates the testability of the system, helps the designer to assess the obtained architectures with respect to test, area and throughput in the early phase of the design and selects the most suitable one. In order to create the templated TTA, the ¿MOVE¿ framework has been addressed. The approach is validated with respect to the ¿Crypt¿ Unix applicatio

    Coarse-grained reconfigurable array architectures

    Get PDF
    Coarse-Grained Reconfigurable Array (CGRA) architectures accelerate the same inner loops that benefit from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efficiently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on flexibility, performance, and power-efficiency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual fine-tuning of source code

    Improving Low Power Processor Efficiency with Static Pipelining

    Get PDF
    A new generation of mobile applications requires reduced energy consumption without sacrificing execution performance. In this paper, we propose to respond to these conflicting demands with an innovative statically pipelined processor supported by an optimizing compiler. The central idea of the approach is that the control during each cycle for each portion of the processor is explicitly represented in each instruction. Thus the pipelining is in effect statically determined by the compiler. The benefits of this approach include simpler hardware and that it allows the compiler to perform optimizations that are not possible on traditional architectures. The initial results indicate that static pipelining can significantly reduce power consumption without adversely affecting performance

    Efficient implementation of channel estimation algorithm for beamforming

    Get PDF
    Abstract. The future 5G mobile network technology is expected to offer significantly better performance than its predecessors. Improved data rates in conjunction with low latency is believed to enable technological revolutions such as self-driving cars. To achieve faster data rates, MIMO systems can be utilized. These systems enable the use of spatial filtering technique known as beamforming. Beamforming that is based on the preacquired channel matrix is computationally very demanding causing challenges in achieving low latency. By acquiring the channel matrix as efficiently as possible, we can facilitate this challenge. In this thesis we examined the implementation of channel estimation algorithm for beamforming with a digital signal processor specialized in vector computation. We present implementations for different antenna configurations based on three different approaches. The results show that the best performance is achieved by applying the algorithm according to the limitations given by the system and the processor architecture. Although the exploitation of the parallel architecture was proved to be challenging, the implementation of the algorithm would have benefitted from the greater amount of parallelism. The current parallel resources will be a challenge especially in the future as the size of antenna configurations is expected to grow.Keilanmuodostuksen tarvitseman kanavaestimointialgoritmin tehokas toteutus. Tiivistelmä. Tulevan viidennen sukupolven mobiiliverkkoteknologian odotetaan tarjoavan merkittävästi edeltäjäänsä parempaa suorituskykyä. Tämän suorituskyvyn tarjoamat suuret datanopeudet yhdistettynä pieneen latenssiin uskotaan mahdollistavan esimerkiksi itsestään ajavat autot. Suurempien datanopeuksien saavuttamiseksi voidaan hyödyntää monitiekanavassa käytettävää MIMO-systeemiä, joka mahdollistaa keilanmuodostuksena tunnetun spatiaalisen suodatusmenetelmän käytön. Etukäteen hankittuun kanavatilatietoon perustuva keilanmuodostus on laskennallisesti erittäin kallista. Tämä aiheuttaa haasteita verkon pienen latenssivaatimuksen saavuttamisessa. Tässä työssä tutkittiin keilanmuodostukselle tarkoitetun kanavaestimointialgoritmin tehokasta toteutusta hyödyntäen vektorilaskentaan erikoistunutta prosessoriarkkitehtuuria. Työssä esitellään kolmea eri lähestymistapaa hyödyntävät toteutukset eri kokoisille antennikonfiguraatioille. Tuloksista nähdään, että paras suorituskyky saavutetaan sovittamalla algoritmi järjestelmän ja arkkitehtuurin asettamien rajoitusten mukaisesti. Vaikka rinnakkaisarkkitehtuurin hyödyntäminen asetti omat haasteensa, olisi algoritmin toteutus hyötynyt suuremmasta rinnakkaisuuden määrästä. Nykyinen rinnakkaisuuden määrä tulee olemaan haaste erityisesti tulevaisuudessa, sillä antennikonfiguraatioiden koon odotetaan kasvavan

    A hybrid transport/control operation triggered architecture

    Get PDF
    We present an approach to a scalable and extensible processor architecture with inherent parallelism named synZEN. One aim was to create a synthesizable application specific processor which can be mapped to an FPGA. Besides architectural features like the interconnection network for flexible data transport and synZEN units with communication managing interface we give an overview of the programming model, show basic operation design and depict assembler notations to program these architecture. The paper closes with a brief toolchain overview and some synthesis results that support our design decisions

    Instruction-set architecture synthesis for VLIW processors

    Get PDF

    Generation of Customized RISC-V Implementations

    Get PDF
    Processor customization has become increasingly important for achieving better performance and energy efficiency in embedded systems. However, customizing processors is time-consuming and error-prone work. The design effort is reduced by describing the processor architecture with high-level languages that are then used to generate the processor implementation. In addition to processor customization, open source hardware and standardization have become increasingly more popular. RISC-V that is a relatively new open standard instruction set architecture, has gained traction both in academia and industry. This thesis work added a RISC-V extension to the OpenASIP toolset that is developed at Tampere University. OpenASIP has wide support for customizing and generating transport triggered architectures. Transport triggered architectures have an exposed datapath that is visible to the programmer, which allows a lower level programming interface. The hardware generation and customization features in OpenASIP were reused by utilizing a transport triggered architecture as the internal microarchitecture together with a microcode unit. The extension generates the RISC-V implementations from an architecture description, which reduces the design effort of customizing the implementation. The RISC-V generator developed in this thesis has customization points for the bypass network, amount of pipeline stages, operation latencies and an optional addition of the standard M extension. The generator was evaluated by generating RISC-V cores with different customization points and comparing their performance and post-synthesis properties with open source implementations. The generated cores with bypass network achieved better performance while consuming slightly more area than the smallest reference design. The microcode hardware only utilized 3.6% of the design area and did not affect the maximum clock frequency

    Kokonaislukuoptimointiin perustuva koodigenerointi näkyvän datapolun arkkitehtuureille

    Get PDF
    As the use of embedded processors has spread throughout the society pervasively, the requirements for the processors have become much more diverse causing general purpose processors to be inefficient on many occasions. This creates the need for customized processors that are tailored for a particular use case. Transport triggered architecture is a processor architecture template that exploits the instruction level parallelism. The architecture provides the basic building blocks and means to construct custom tailored processors. Transport triggered architecture processors are statically scheduled, thus powerful instruction scheduling algorithms can bring up significant efficiency increases in terms of chip area, clock frequency, and energy consumption. This thesis proposes an integer linear programming model for the instruction scheduling problem of the transport triggered architecture. The model describes the architecture characteristics, the particular processor resource constraints, and the operation dependencies of the scheduled program. It is possible to optimize the model for various criterion, for example to achieve as energy efficient processors as possible. This scheduling algorithm was implemented to the high-level language compiler of the TTA-based Co-design Environment, which is a toolset for designing processors using the transport triggered architecture template. The model was tested and measured with example problems such as complex number arithmetics, and vector dot product. Such example algorithms are typically executed in embedded processors and parallelize reasonably well. The performance results were compared to the existing heuristic graph-based scheduling algorithm of the toolset compiler. The study indicates that the integer linear programming based instruction scheduler produced significantly shorter schedules compared to the heuristic scheduler. In addition, the amount of register access in the compiled programs was generally notably less than those of the heuristic scheduler. On the other hand, the proposed scheduler used distinctly more execution time than the heuristic scheduler
    corecore