Search CORE

758 research outputs found

Design and Test Space Exploration of Transport-Triggered Architectures

Author: Kerkhoff H.G.
Tangelder R.J.W.T.
Zivkovic V.A.
Publication venue: IEEE
Publication date: 01/01/2000
Field of study

This paper describes a new approach in the high level design and test of transport-triggered architectures (TTA), a special type of application specific instruction processors (ASIP). The proposed method introduces the test as an additional constraint, besides throughput and circuit area. The method, that calculates the testability of the system, helps the designer to assess the obtained architectures with respect to test, area and throughput in the early phase of the design and selects the most suitable one. In order to create the templated TTA, the ¿MOVE¿ framework has been addressed. The approach is validated with respect to the ¿Crypt¿ Unix applicatio

CiteSeerX

University of Twente Research Information

Coarse-grained reconfigurable array architectures

Author: A Lambrechts
B Bougard
B Bougard
B Mei
B Mei
B Mei
B Sutter De
G Venkataramani
H Park
H Park
J Lee
JMP Cardoso
JW Waerdt van de
K Berkel van
K Bondalapati
K Sankaralingam
KE Coons
LH Lee
M Ahn
M Gebhart
M Schlansker
M Taylor
M Woh
MD Galanis
MH Lee
S Friedman
SA Mahlke
T Oh
Y Kim
Y Kim
Y Kim
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Coarse-Grained Reconﬁgurable Array (CGRA) architectures accelerate the same inner loops that beneﬁt from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efﬁciently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on ﬂexibility, performance, and power-efﬁciency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual ﬁne-tuning of source code

Crossref

Ghent University Academic Bibliography

Improving Low Power Processor Efficiency with Static Pipelining

Author: Finlayson Ian
Tyson Gary
Uh Gang-Ryung
Whalley David
Publication venue: 'IUScholarWorks'
Publication date: 12/02/2011
Field of study

A new generation of mobile applications requires reduced energy consumption without sacrificing execution performance. In this paper, we propose to respond to these conflicting demands with an innovative statically pipelined processor supported by an optimizing compiler. The central idea of the approach is that the control during each cycle for each portion of the processor is explicitly represented in each instruction. Thus the pipelining is in effect statically determined by the compiler. The benefits of this approach include simpler hardware and that it allows the compiler to perform optimizations that are not possible on traditional architectures. The initial results indicate that static pipelining can significantly reduce power consumption without adversely affecting performance

Crossref

Boise State University - ScholarWorks

Efficient implementation of channel estimation algorithm for beamforming

Author: Afflekt A. (Arttu)
Publication venue: University of Oulu
Publication date: 15/06/2020
Field of study

Abstract. The future 5G mobile network technology is expected to offer significantly better performance than its predecessors. Improved data rates in conjunction with low latency is believed to enable technological revolutions such as self-driving cars. To achieve faster data rates, MIMO systems can be utilized. These systems enable the use of spatial filtering technique known as beamforming. Beamforming that is based on the preacquired channel matrix is computationally very demanding causing challenges in achieving low latency. By acquiring the channel matrix as efficiently as possible, we can facilitate this challenge. In this thesis we examined the implementation of channel estimation algorithm for beamforming with a digital signal processor specialized in vector computation. We present implementations for different antenna configurations based on three different approaches. The results show that the best performance is achieved by applying the algorithm according to the limitations given by the system and the processor architecture. Although the exploitation of the parallel architecture was proved to be challenging, the implementation of the algorithm would have benefitted from the greater amount of parallelism. The current parallel resources will be a challenge especially in the future as the size of antenna configurations is expected to grow.Keilanmuodostuksen tarvitseman kanavaestimointialgoritmin tehokas toteutus. Tiivistelmä. Tulevan viidennen sukupolven mobiiliverkkoteknologian odotetaan tarjoavan merkittävästi edeltäjäänsä parempaa suorituskykyä. Tämän suorituskyvyn tarjoamat suuret datanopeudet yhdistettynä pieneen latenssiin uskotaan mahdollistavan esimerkiksi itsestään ajavat autot. Suurempien datanopeuksien saavuttamiseksi voidaan hyödyntää monitiekanavassa käytettävää MIMO-systeemiä, joka mahdollistaa keilanmuodostuksena tunnetun spatiaalisen suodatusmenetelmän käytön. Etukäteen hankittuun kanavatilatietoon perustuva keilanmuodostus on laskennallisesti erittäin kallista. Tämä aiheuttaa haasteita verkon pienen latenssivaatimuksen saavuttamisessa. Tässä työssä tutkittiin keilanmuodostukselle tarkoitetun kanavaestimointialgoritmin tehokasta toteutusta hyödyntäen vektorilaskentaan erikoistunutta prosessoriarkkitehtuuria. Työssä esitellään kolmea eri lähestymistapaa hyödyntävät toteutukset eri kokoisille antennikonfiguraatioille. Tuloksista nähdään, että paras suorituskyky saavutetaan sovittamalla algoritmi järjestelmän ja arkkitehtuurin asettamien rajoitusten mukaisesti. Vaikka rinnakkaisarkkitehtuurin hyödyntäminen asetti omat haasteensa, olisi algoritmin toteutus hyötynyt suuremmasta rinnakkaisuuden määrästä. Nykyinen rinnakkaisuuden määrä tulee olemaan haaste erityisesti tulevaisuudessa, sillä antennikonfiguraatioiden koon odotetaan kasvavan

University of Oulu Repository - Jultika

A hybrid transport/control operation triggered architecture

Author: Gremzow Carsten
Hauser Stefan
Moser Nico
Publication venue
Publication date: 01/01/2010
Field of study

We present an approach to a scalable and extensible processor architecture with inherent parallelism named synZEN. One aim was to create a synthesizable application specific processor which can be mapped to an FPGA. Besides architectural features like the interconnection network for flexible data transport and synZEN units with communication managing interface we give an overview of the programming model, show basic operation design and depict assembler notations to program these architecture. The paper closes with a brief toolchain overview and some synthesis results that support our design decisions

DepositOnce

Instruction-set architecture synthesis for VLIW processors

Author: Jordans R.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2015
Field of study

Repository TU/e

Pure OAI Repository

Generation of Customized RISC-V Implementations

Author: Hepola Kari
Publication venue
Publication date: 15/02/2022
Field of study

Processor customization has become increasingly important for achieving better performance and energy efficiency in embedded systems. However, customizing processors is time-consuming and error-prone work. The design effort is reduced by describing the processor architecture with high-level languages that are then used to generate the processor implementation. In addition to processor customization, open source hardware and standardization have become increasingly more popular. RISC-V that is a relatively new open standard instruction set architecture, has gained traction both in academia and industry. This thesis work added a RISC-V extension to the OpenASIP toolset that is developed at Tampere University. OpenASIP has wide support for customizing and generating transport triggered architectures. Transport triggered architectures have an exposed datapath that is visible to the programmer, which allows a lower level programming interface. The hardware generation and customization features in OpenASIP were reused by utilizing a transport triggered architecture as the internal microarchitecture together with a microcode unit. The extension generates the RISC-V implementations from an architecture description, which reduces the design effort of customizing the implementation. The RISC-V generator developed in this thesis has customization points for the bypass network, amount of pipeline stages, operation latencies and an optional addition of the standard M extension. The generator was evaluated by generating RISC-V cores with different customization points and comparing their performance and post-synthesis properties with open source implementations. The generated cores with bypass network achieved better performance while consuming slightly more area than the smallest reference design. The microcode hardware only utilized 3.6% of the design area and did not affect the maximum clock frequency

Trepo - Institutional Repository of Tampere University

Recommended from our members

Psi: A Silicon Compiler for Very Fast Protocol Processing

Author: Abu-Amara H.
Balraj Timothy S.
Barzilai T.
Yemini Yechiam
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1989
Field of study

Conventional protocols implementations typically fall short, by a few orders of magnitude, of supporting the speeds afforded by high-speed optical transmission media. This protocol processing bottleneck is a key hurdle in taking advantage of the opportunities presented by high-speed communications. This paper describes PSi, a silicon compiler that transforms formal protocol specifications into efficient VLSI implementations. PSi takes advantage of the parallelisms intrinsic to a given protocol to accomplish very high-speed implementations. Initial application of PSi to the IEEE 802.2 (logical link control) leads to processing rates in the order of 106 packets per second (p/s). The 802.2 was selected as a benchmark of complexity; light-weight protocols can accomplish even higher processing rates, reaching the limits set by chip clock rates (i.e., a packet per cycle). These speeds significantly exceed typical of software implementations (up to a few hundred p/s) or special hardware-assisted implementations (up to a few thousands p/s). More importantly, at these rates when the packet size is 103-4 bits the protocol throughput of 109-10 bits/sec reaches the limiting throughput afforded by memory technology. Thus, the protocol processing bottleneck is pushed to the ultimate bounds set by VLSI technologies

Columbia University Academic Commons

Kokonaislukuoptimointiin perustuva koodigenerointi näkyvän datapolun arkkitehtuureille

Author: Äijö Tomi Mikael
Publication venue
Publication date: 03/12/2014
Field of study

As the use of embedded processors has spread throughout the society pervasively, the requirements for the processors have become much more diverse causing general purpose processors to be inefficient on many occasions. This creates the need for customized processors that are tailored for a particular use case. Transport triggered architecture is a processor architecture template that exploits the instruction level parallelism. The architecture provides the basic building blocks and means to construct custom tailored processors. Transport triggered architecture processors are statically scheduled, thus powerful instruction scheduling algorithms can bring up significant efficiency increases in terms of chip area, clock frequency, and energy consumption. This thesis proposes an integer linear programming model for the instruction scheduling problem of the transport triggered architecture. The model describes the architecture characteristics, the particular processor resource constraints, and the operation dependencies of the scheduled program. It is possible to optimize the model for various criterion, for example to achieve as energy efficient processors as possible. This scheduling algorithm was implemented to the high-level language compiler of the TTA-based Co-design Environment, which is a toolset for designing processors using the transport triggered architecture template. The model was tested and measured with example problems such as complex number arithmetics, and vector dot product. Such example algorithms are typically executed in embedded processors and parallelize reasonably well. The performance results were compared to the existing heuristic graph-based scheduling algorithm of the toolset compiler. The study indicates that the integer linear programming based instruction scheduler produced significantly shorter schedules compared to the heuristic scheduler. In addition, the amount of register access in the compiled programs was generally notably less than those of the heuristic scheduler. On the other hand, the proposed scheduler used distinctly more execution time than the heuristic scheduler

Trepo - Institutional Repository of Tampere University