25 research outputs found

    Laitteisto-optimoinnit matalan tehonkulutuksen prosessoreille

    Get PDF
    Modernien prosessorien suunnittelussa tehonkulutuksen huomioonottaminen on tärkeää. Pienet prosessorijärjestelmät, kuten mobiililaitteet, hyötyvät tehonkulutusoptimoinnista pidemmän akunkeston muodossa. Optimointi auttaa myös vastaamaan mobiililaitteiden usein tiukkoihin lämpösuunnittelurajoituksiin. Tehonkulutusoptimointeja voidaan tehdä kaikilla suunnittelun abstraktiotasoilla. Arkkitehtuuritasolla, optimaalisen arkkitehtuurin valitseminen ei usein ole suoraviivaista. Siirtoliipaisuarkkitehtuuria käyttävät prosessorit hyödyntävät käskytason rinnakkaisuutta tehokkaasti ja ovat hyvä valinta matalan tehonkulutuksen sovelluksiin. Niitä käyttäen prosessorisuunnittelijan on mahdollista toteuttaa erilaisia tehonkulutus- ja suorituskykyoptimointeja, joista osa on siirtoliipaisuarkkitehtuurille yksilöllisiä. Tässä diplomityössä tehtiin ensin kirjallisuuskatsaus yleisimmin käytetyistä tehonkulutus- ja suorituskykyoptimoinneista. Näistä neljä toteutettiin Tampereen Teknillisessä Yliopistossa kehitettyyn TTA-based Co-design Environment (TCE) -kehitysympäristöön, joka mahdollistaa TTA-prosessorien suunnittelun ja ohjelmoinnin. Vaikutusten analysoimiseksi optimoinnit toteutetiin kolmeen eri tarkoitusta varten suunniteltuun prosessoriytimeen, jotka syntesoitiin Synopsys Design Compilerilla. Kaikki ytimet hyötyivät optimoinneista, saavuttaen parhaassa tapauksessa 26% tehonkulutuspienennyksen, pinta-alan kasvaessa 3%

    OpenASIP 2.0 : Co-Design Toolset for RISC-V Application-Specific Instruction-Set Processors

    Get PDF
    Application-specific instruction-set processors (ASIPs) are interesting for improving performance or energy-efficiency for a set of applications of interest while supporting flexibility via compiler-supported programmability. In the past years, the open source hardware community has become extremely active, mainly fueled by the massive popularity of the open-standard RISC-V instruction set architecture. However, the community still lacks an open source ASIP co-design tool that supports rapid customization of RISC-V-based processors with an automatically retargetable programming toolchain. To this end, we introduce OpenASIP 2.0: A co-design toolset that is built on top of our earlier ASIP customization toolset work by extending it to support customization of RISC-V-based processors. It enables RTL generation as well as high-level language programming of RISC-V processors with custom instructions. In this paper, in addition to describing the toolset's key technical internals, we demonstrate it with customization cases for AES, CRC and SHA applications. With the example custom instructions easily integrated using the toolset, the run time was reduced by 44% on average compared to the standard RISC-V ISA. The speedups were achieved with a negligible datapath area overhead of 1.5%, and a 1.4% reduction in the maximum clock frequency.acceptedVersionPeer reviewe

    Laitteisto-optimoinnit matalan tehonkulutuksen prosessoreille

    Get PDF
    Modernien prosessorien suunnittelussa tehonkulutuksen huomioonottaminen on tärkeää. Pienet prosessorijärjestelmät, kuten mobiililaitteet, hyötyvät tehonkulutusoptimoinnista pidemmän akunkeston muodossa. Optimointi auttaa myös vastaamaan mobiililaitteiden usein tiukkoihin lämpösuunnittelurajoituksiin. Tehonkulutusoptimointeja voidaan tehdä kaikilla suunnittelun abstraktiotasoilla. Arkkitehtuuritasolla, optimaalisen arkkitehtuurin valitseminen ei usein ole suoraviivaista. Siirtoliipaisuarkkitehtuuria käyttävät prosessorit hyödyntävät käskytason rinnakkaisuutta tehokkaasti ja ovat hyvä valinta matalan tehonkulutuksen sovelluksiin. Niitä käyttäen prosessorisuunnittelijan on mahdollista toteuttaa erilaisia tehonkulutus- ja suorituskykyoptimointeja, joista osa on siirtoliipaisuarkkitehtuurille yksilöllisiä. Tässä diplomityössä tehtiin ensin kirjallisuuskatsaus yleisimmin käytetyistä tehonkulutus- ja suorituskykyoptimoinneista. Näistä neljä toteutettiin Tampereen Teknillisessä Yliopistossa kehitettyyn TTA-based Co-design Environment (TCE) -kehitysympäristöön, joka mahdollistaa TTA-prosessorien suunnittelun ja ohjelmoinnin. Vaikutusten analysoimiseksi optimoinnit toteutetiin kolmeen eri tarkoitusta varten suunniteltuun prosessoriytimeen, jotka syntesoitiin Synopsys Design Compilerilla. Kaikki ytimet hyötyivät optimoinneista, saavuttaen parhaassa tapauksessa 26% tehonkulutuspienennyksen, pinta-alan kasvaessa 3%

    Energy-Efficient Instruction Streams for Embedded Processors

    No full text
    The information and communication technology (ICT) sector is consuming an increasing proportion of global electricity production. At the heart of ICT are computer systems that perform computations on data. During the last two decades the rate of improvement in their energy-efficiency has declined. Although the best energy-efficiency is obtained with systems that execute a fixed set of tasks, they are inflexible. The flexibility can be improved with programmability in the form of instruction set support. However, the resulting instruction stream incurs area and energy overheads. To reduce them, a variety of instruction stream components have been proposed. In particular, first-level components closest to the compute elements in memory hierarchy have been studied carefully. Although dynamically controlled components are easy to integrate into systems and allow a simplified view for programmers, software-controlled components have been shown to improve energy-efficiency. While they have been researched previously, the energy-efficiency of programmable processors is not yet at the level of fixed function accelerators, and there is room for novel methods utilizing fine-grained software control. In addition to the novel first-level components, memory technologies used in the instruction streams have received wide interest in recent years. While emerging memory technologies may result in extreme efficiency in future devices, customizing the currently used technologies may offer benefits with less effort for the near future. This thesis proposes methods to improve the instruction stream energy-efficiency with the goal of reducing the gap between instruction set programmable and fixed-function computer systems. By evaluating design choices in first-level instruction stream components and allowing fine-grained software control, energy consumption is reduced. Similarly, efficiency of unconventional instruction memories is improved by exposing their contents to the program compiler. These methods help future computer systems to keep increasing their energy-efficiency and reduce the proportion of ICT from global energy consumption

    Programmable Dictionary Code Compression for Instruction Stream Energy Efficiency

    Get PDF
    We propose a novel instruction compression scheme based on fine-grained programmable dictionaries. In its core is a compile-time region-based control flow analysis to selectively update the dictionary contents at runtime, minimizing the update overheads, while maximizing the beneficial use of the dictionary slots. Unlike in the previous work, our approach selects regions of instructions to compress at compile time and changes dictionary contents in a fine-grained manner at runtime with the primary goal of reducing the energy footprint of the processor instruction stream. The proposed instruction compression scheme is evaluated using RISC-V as an example instruction set architecture. The energy savings are compared to an instruction scratch pad and a filter cache as the next level storage. The method reduces instruction stream energy consumption up to 21 % and 5.5 % on average when compared to the RISC-V C extension with a 1% runtime overhead and a negligible hardware overhead. The previous state-of-the-art programmable dictionary compression method provides a slightly better compression ratio, but induces about 30 % runtime overhead.acceptedVersionPeer reviewe

    Energy-Delay Trade-offs in Instruction Register File Design

    Get PDF
    In order to decrease latency and energy consumption, processors use hierarchical memory systems to store temporally and spatially related instructions close to the core. Instruction register file (IRF) is an energy-efficient solution for the lowest level in the instruction memory hierarchy. Being compiler-controlled, it removes the area and energy overheads involved in cache tag checking and adds flexibility in the separation of the instruction fetch and execution. In this paper, we systematically evaluate for the first time the effect of three IRF design variations on energy and delay against an unoptimized baseline IRF. Having instruction fetch and decode with IRF in the same pipeline stage allows minimal delay branching, but results in low operating clock frequency and impaired energy delay product compared to splitting them into two stages. Assuring instruction presence in IRF before execution with software reduces the area and increases maximum clock frequency compared to assurance with hardware, but requires compiler analysis. With a proposed compiler-analyzed instruction placement and co-designed hardware implementation, energy consumption with the best IRF variant is reduced by 9% on average with EEMBC Coremark and CHStone benchmaks. The energy delay product is improved by 23% when compared to the baseline IRF approach.acceptedVersionPeer reviewe

    Energy Efficient Low Latency Multi-issue Cores for Intelligent Always-On IoT Applications

    Get PDF
    Advanced Internet-of-Things applications require control-oriented codes to be executed with low latency for fast responsivity while their advanced signal processing and decision making tasks require computational capabilities. For this context, we propose three multi-issue core designs featuring an exposed datapath architecture with high performance, while retaining energy-efficiency. These features are achieved with exploitation of instruction-level parallelism, fast branching and the use of an instruction register file. With benchmarks in control-flow and signal processing application domains we measured in the best case 64% reduced energy consumption compared to a state-of-the-art RISC core, while consuming less silicon area. A high-performance design point reaches nearly 2.6 GHz operating frequency in the best case, over 2× improvement, while simultaneously achieving a 14% improvement in system energy-delay product.publishedVersionPeer reviewe

    Xor-Masking: A Novel Statistical Method for Instruction Read Energy Reduction in Contemporary SRAM Technologies

    Get PDF
    Pervasive computing calls for ultra-low-power devices to extend the battery life enough to enable usability in everyday life. Especially in devices involving programmable processors, the energy consumption of integrated memories often plays a critical role. Consequently, contemporary memory technologies focus more on the energy-efficiency aspects with new custom CMOS SRAM cells with tailored energy consumption profiles constantly being proposed. This paper proposes a method that exploits such contemporary low power SRAM memories that are energy optimized for storing a certain logic value to improve the energy-efficiency of instruction fetching, a major energy overhead in programmable designs. The method utilizes a low overhead xor-masking approach combined with statistical program analysis to produce optimal masks to reduce the occurrence of the more energy consuming bit values in the fetched instructions. In comparison to the "bus invert" technique typically used with similar SRAMs, the proposed method incurs minimal area overhead while still reducing the total energy consumption of an example LatticeMico32 core up to 5%. The improvement to instruction memory energy consumption alone is up to 13% with a set of benchmarks.acceptedVersionPeer reviewe

    Instruction Fetch Energy Reduction with Biased SRAMs

    Get PDF
    Especially in programmable processors, energy consumption of integrated memories can become a limiting design factor due to thermal dissipation power constraints and limited battery capacity. Consequently, contemporary improvement efforts on memory technologies are focusing more on the energy-efficiency aspects, which has resulted in biased CMOS SRAM cells that increase energy efficiency by favoring one logical value over another. In this paper, xor-masking, a method for exploiting such contemporary low power SRAM memories is proposed to improve the energy-efficiency of instruction fetching. Xor-masking utilizes static program analysis statistics to produce optimal encoding masks to reduce the occurrence of the more energy consuming instruction bit values in the fetched instructions. The method is evaluated on LatticeMico32, a small RISC core popular in ultra low power designs, and on a wide instruction word high performance low power DSP. Compared to the previous “bus invert” technique typically used with similar SRAMs, the proposed method reduces instruction read energy consumption of the LatticeMico32 by up to 13% and 38% on the DSP core.acceptedVersionPeer reviewe
    corecore