365 research outputs found

    Generation of Customized RISC-V Implementations

    Get PDF
    Processor customization has become increasingly important for achieving better performance and energy efficiency in embedded systems. However, customizing processors is time-consuming and error-prone work. The design effort is reduced by describing the processor architecture with high-level languages that are then used to generate the processor implementation. In addition to processor customization, open source hardware and standardization have become increasingly more popular. RISC-V that is a relatively new open standard instruction set architecture, has gained traction both in academia and industry. This thesis work added a RISC-V extension to the OpenASIP toolset that is developed at Tampere University. OpenASIP has wide support for customizing and generating transport triggered architectures. Transport triggered architectures have an exposed datapath that is visible to the programmer, which allows a lower level programming interface. The hardware generation and customization features in OpenASIP were reused by utilizing a transport triggered architecture as the internal microarchitecture together with a microcode unit. The extension generates the RISC-V implementations from an architecture description, which reduces the design effort of customizing the implementation. The RISC-V generator developed in this thesis has customization points for the bypass network, amount of pipeline stages, operation latencies and an optional addition of the standard M extension. The generator was evaluated by generating RISC-V cores with different customization points and comparing their performance and post-synthesis properties with open source implementations. The generated cores with bypass network achieved better performance while consuming slightly more area than the smallest reference design. The microcode hardware only utilized 3.6% of the design area and did not affect the maximum clock frequency

    Improving Low Power Processor Efficiency with Static Pipelining

    Get PDF
    A new generation of mobile applications requires reduced energy consumption without sacrificing execution performance. In this paper, we propose to respond to these conflicting demands with an innovative statically pipelined processor supported by an optimizing compiler. The central idea of the approach is that the control during each cycle for each portion of the processor is explicitly represented in each instruction. Thus the pipelining is in effect statically determined by the compiler. The benefits of this approach include simpler hardware and that it allows the compiler to perform optimizations that are not possible on traditional architectures. The initial results indicate that static pipelining can significantly reduce power consumption without adversely affecting performance

    Kokonaislukuoptimointiin perustuva koodigenerointi näkyvän datapolun arkkitehtuureille

    Get PDF
    As the use of embedded processors has spread throughout the society pervasively, the requirements for the processors have become much more diverse causing general purpose processors to be inefficient on many occasions. This creates the need for customized processors that are tailored for a particular use case. Transport triggered architecture is a processor architecture template that exploits the instruction level parallelism. The architecture provides the basic building blocks and means to construct custom tailored processors. Transport triggered architecture processors are statically scheduled, thus powerful instruction scheduling algorithms can bring up significant efficiency increases in terms of chip area, clock frequency, and energy consumption. This thesis proposes an integer linear programming model for the instruction scheduling problem of the transport triggered architecture. The model describes the architecture characteristics, the particular processor resource constraints, and the operation dependencies of the scheduled program. It is possible to optimize the model for various criterion, for example to achieve as energy efficient processors as possible. This scheduling algorithm was implemented to the high-level language compiler of the TTA-based Co-design Environment, which is a toolset for designing processors using the transport triggered architecture template. The model was tested and measured with example problems such as complex number arithmetics, and vector dot product. Such example algorithms are typically executed in embedded processors and parallelize reasonably well. The performance results were compared to the existing heuristic graph-based scheduling algorithm of the toolset compiler. The study indicates that the integer linear programming based instruction scheduler produced significantly shorter schedules compared to the heuristic scheduler. In addition, the amount of register access in the compiled programs was generally notably less than those of the heuristic scheduler. On the other hand, the proposed scheduler used distinctly more execution time than the heuristic scheduler

    Efficient implementation of channel estimation algorithm for beamforming

    Get PDF
    Abstract. The future 5G mobile network technology is expected to offer significantly better performance than its predecessors. Improved data rates in conjunction with low latency is believed to enable technological revolutions such as self-driving cars. To achieve faster data rates, MIMO systems can be utilized. These systems enable the use of spatial filtering technique known as beamforming. Beamforming that is based on the preacquired channel matrix is computationally very demanding causing challenges in achieving low latency. By acquiring the channel matrix as efficiently as possible, we can facilitate this challenge. In this thesis we examined the implementation of channel estimation algorithm for beamforming with a digital signal processor specialized in vector computation. We present implementations for different antenna configurations based on three different approaches. The results show that the best performance is achieved by applying the algorithm according to the limitations given by the system and the processor architecture. Although the exploitation of the parallel architecture was proved to be challenging, the implementation of the algorithm would have benefitted from the greater amount of parallelism. The current parallel resources will be a challenge especially in the future as the size of antenna configurations is expected to grow.Keilanmuodostuksen tarvitseman kanavaestimointialgoritmin tehokas toteutus. Tiivistelmä. Tulevan viidennen sukupolven mobiiliverkkoteknologian odotetaan tarjoavan merkittävästi edeltäjäänsä parempaa suorituskykyä. Tämän suorituskyvyn tarjoamat suuret datanopeudet yhdistettynä pieneen latenssiin uskotaan mahdollistavan esimerkiksi itsestään ajavat autot. Suurempien datanopeuksien saavuttamiseksi voidaan hyödyntää monitiekanavassa käytettävää MIMO-systeemiä, joka mahdollistaa keilanmuodostuksena tunnetun spatiaalisen suodatusmenetelmän käytön. Etukäteen hankittuun kanavatilatietoon perustuva keilanmuodostus on laskennallisesti erittäin kallista. Tämä aiheuttaa haasteita verkon pienen latenssivaatimuksen saavuttamisessa. Tässä työssä tutkittiin keilanmuodostukselle tarkoitetun kanavaestimointialgoritmin tehokasta toteutusta hyödyntäen vektorilaskentaan erikoistunutta prosessoriarkkitehtuuria. Työssä esitellään kolmea eri lähestymistapaa hyödyntävät toteutukset eri kokoisille antennikonfiguraatioille. Tuloksista nähdään, että paras suorituskyky saavutetaan sovittamalla algoritmi järjestelmän ja arkkitehtuurin asettamien rajoitusten mukaisesti. Vaikka rinnakkaisarkkitehtuurin hyödyntäminen asetti omat haasteensa, olisi algoritmin toteutus hyötynyt suuremmasta rinnakkaisuuden määrästä. Nykyinen rinnakkaisuuden määrä tulee olemaan haaste erityisesti tulevaisuudessa, sillä antennikonfiguraatioiden koon odotetaan kasvavan

    Vector Operation Support for Transport Triggered Architectures

    Get PDF
    High performance and low power consumption requirements usually restrict the design process of embedded processors. Traditional design solutions do not apply to the requirements today, but instead demands exploiting varying levels of parallelism. In order to reduce design time and effort, a powerful toolset is required to design new parallel processors effectively. TTA-based Co-design Environment (TCE) is a toolset developed in Tampere University of Technology for designing customized parallel processors. It is based on a modular Transport Triggered Architecture (TTA) processor architecture template, which provides easy customization and allows exploiting instruction-level parallelism for high performance execution. Single Instruction, Multiple Data (SIMD) paradigm provides powerful data-level parallel vector computation for many applications in embedded processing. It is one of the most common ways to exploit parallelism in today's processor designs in order to gain greater execution efficiency and, therefore, to meet the performance requirements. This work describes how data-level parallel SIMD support is introduced and integrated to the TCE design flow for more diverse parallelism support. The support allows designers to customize and program processors with wide vector operations. The work presents the required modification points along with the new tools that were added to the toolset. Much weight is given for the retargetable compiler, which must be able to adapt to all resources on TTA machines. The added tools were required to provide as much automatic behavior as possible to maintain effective design flow. In addition, the thesis presents how the modifications and new features were verified

    A hybrid transport/control operation triggered architecture

    Get PDF
    We present an approach to a scalable and extensible processor architecture with inherent parallelism named synZEN. One aim was to create a synthesizable application specific processor which can be mapped to an FPGA. Besides architectural features like the interconnection network for flexible data transport and synZEN units with communication managing interface we give an overview of the programming model, show basic operation design and depict assembler notations to program these architecture. The paper closes with a brief toolchain overview and some synthesis results that support our design decisions
    corecore