Search CORE

461 research outputs found

Coarse-grained reconfigurable array architectures

Author: A Lambrechts
B Bougard
B Bougard
B Mei
B Mei
B Mei
B Sutter De
G Venkataramani
H Park
H Park
J Lee
JMP Cardoso
JW Waerdt van de
K Berkel van
K Bondalapati
K Sankaralingam
KE Coons
LH Lee
M Ahn
M Gebhart
M Schlansker
M Taylor
M Woh
MD Galanis
MH Lee
S Friedman
SA Mahlke
T Oh
Y Kim
Y Kim
Y Kim
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Coarse-Grained Reconﬁgurable Array (CGRA) architectures accelerate the same inner loops that beneﬁt from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efﬁciently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on ﬂexibility, performance, and power-efﬁciency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual ﬁne-tuning of source code

Crossref

Ghent University Academic Bibliography

A case for merging the ILP and DLP paradigms

Author: Espasa Sans Roger
Quintana Rodríguez Francisca
Valero Cortés Mateo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1998
Field of study

The goal of this paper is to show that instruction level parallelism (ILP) and data-level parallelism (DLP) can be merged in a single architecture to execute vectorizable code at a performance level that can not be achieved using either paradigm on its own. We will show that the combination of the two techniques yields very high performance at a low cost and a low complexity. We will show that this architecture can reach a performance equivalent to a superscalar processor that sustained 10 instructions per cycle. We will see that the machine exploiting both types of parallelism improves upon the ILP-only machine by factors of 1.5-1.8. We also present a study on the scalability of both paradigms and show that, when we increase resources to reach a 16-issue machine, the advantage of the ILP+DLP machine over the ILP-only machine increases up to 2.0-3.45. While the peak achieved IPC for the ILP machine is 4, the ILP+DLP machine exceeds 10 instructions per cycle.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Fast 2D-DCT implementations for VLIW processors

Author: Bull DR
Canagarajah CN
Sohm OP
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/1999
Field of study

Explore Bristol Research

Brief History of Computer Architecture Evolution and Future Trends

Author: Groves R
Publication venue: CERN
Publication date: 01/01/1995
Field of study

CERN Document Server

State of the art baseband DSP platforms for Software Defined Radio: A survey

Author: Claudio Brunelli
Fabio Garzia
Heikki Berg
Jari Nurmi
Omer Anjum
Tapani Ahonen
Publication venue: Springer Nature
Publication date: 01/01/2011
Field of study

Software Defined Radio (SDR) is an innovative approach which is becoming a more and more promising technology for future mobile handsets. Several proposals in the field of embedded systems have been introduced by different universities and industries to support SDR applications. This article presents an overview of current platforms and analyzes the related architectural choices, the current issues in SDR, as well as potential future trends.Peer reviewe

Springer - Publisher Connector

Trepo - Institutional Repository of Tampere University

Exploring Processor and Memory Architectures for Multimedia

Author: Iranpour Ali
Publication venue
Publication date: 01/01/2012
Field of study

Multimedia has become one of the cornerstones of our 21st century society and, when combined with mobility, has enabled a tremendous evolution of our society. However, joining these two concepts introduces many technical challenges. These range from having sufficient performance for handling multimedia content to having the battery stamina for acceptable mobile usage. When taking a projection of where we are heading, we see these issues becoming ever more challenging by increased mobility as well as advancements in multimedia content, such as introduction of stereoscopic 3D and augmented reality. The increased performance needs for handling multimedia come not only from an ongoing step-up in resolution going from QVGA (320x240) to Full HD (1920x1080) a 27x increase in less than half a decade. On top of this, there is also codec evolution (MPEG-2 to H.264 AVC) that adds to the computational load increase. To meet these performance challenges there has been processing and memory architecture advances (SIMD, out-of-order superscalarity, multicore processing and heterogeneous multilevel memories) in the mobile domain, in conjunction with ever increasing operating frequencies (200MHz to 2GHz) and on-chip memory sizes (128KB to 2-3MB). At the same time there is an increase in requirements for mobility, placing higher demands on battery-powered systems despite the steady increase in battery capacity (500 to 2000mAh). This leaves negative net result in-terms of battery capacity versus performance advances. In order to make optimal use of these architectural advances and to meet the power limitations in mobile systems, there is a need for taking an overall approach on how to best utilize these systems. The right trade-off between performance and power is crucial. On top of these constraints, the flexibility aspects of the system need to be addressed. All this makes it very important to reach the right architectural balance in the system. The first goal for this thesis is to examine multimedia applications and propose a flexible solution that can meet the architectural requirements in a mobile system. Secondly, propose an automated methodology of optimally mapping multimedia data and instructions to a heterogeneous multilevel memory subsystem. The proposed methodology uses constraint programming for solving a multidimensional optimization problem. Results from this work indicate that using today’s most advanced mobile processor technology together with a multi-level heterogeneous on-chip memory subsystem can meet the performance requirements for handling multimedia. By utilizing the automated optimal memory mapping method presented in this thesis lower total power consumption can be achieved, whilst performance for multimedia applications is improved, by employing enhanced memory management. This is achieved through reduced external accesses and better reuse of memory objects. This automatic method shows high accuracy, up to 90%, for predicting multimedia memory accesses for a given architecture

CiteSeerX

Lund University Publications

Efficient implementation of channel estimation algorithm for beamforming

Author: Afflekt A. (Arttu)
Publication venue: University of Oulu
Publication date: 15/06/2020
Field of study

Abstract. The future 5G mobile network technology is expected to offer significantly better performance than its predecessors. Improved data rates in conjunction with low latency is believed to enable technological revolutions such as self-driving cars. To achieve faster data rates, MIMO systems can be utilized. These systems enable the use of spatial filtering technique known as beamforming. Beamforming that is based on the preacquired channel matrix is computationally very demanding causing challenges in achieving low latency. By acquiring the channel matrix as efficiently as possible, we can facilitate this challenge. In this thesis we examined the implementation of channel estimation algorithm for beamforming with a digital signal processor specialized in vector computation. We present implementations for different antenna configurations based on three different approaches. The results show that the best performance is achieved by applying the algorithm according to the limitations given by the system and the processor architecture. Although the exploitation of the parallel architecture was proved to be challenging, the implementation of the algorithm would have benefitted from the greater amount of parallelism. The current parallel resources will be a challenge especially in the future as the size of antenna configurations is expected to grow.Keilanmuodostuksen tarvitseman kanavaestimointialgoritmin tehokas toteutus. Tiivistelmä. Tulevan viidennen sukupolven mobiiliverkkoteknologian odotetaan tarjoavan merkittävästi edeltäjäänsä parempaa suorituskykyä. Tämän suorituskyvyn tarjoamat suuret datanopeudet yhdistettynä pieneen latenssiin uskotaan mahdollistavan esimerkiksi itsestään ajavat autot. Suurempien datanopeuksien saavuttamiseksi voidaan hyödyntää monitiekanavassa käytettävää MIMO-systeemiä, joka mahdollistaa keilanmuodostuksena tunnetun spatiaalisen suodatusmenetelmän käytön. Etukäteen hankittuun kanavatilatietoon perustuva keilanmuodostus on laskennallisesti erittäin kallista. Tämä aiheuttaa haasteita verkon pienen latenssivaatimuksen saavuttamisessa. Tässä työssä tutkittiin keilanmuodostukselle tarkoitetun kanavaestimointialgoritmin tehokasta toteutusta hyödyntäen vektorilaskentaan erikoistunutta prosessoriarkkitehtuuria. Työssä esitellään kolmea eri lähestymistapaa hyödyntävät toteutukset eri kokoisille antennikonfiguraatioille. Tuloksista nähdään, että paras suorituskyky saavutetaan sovittamalla algoritmi järjestelmän ja arkkitehtuurin asettamien rajoitusten mukaisesti. Vaikka rinnakkaisarkkitehtuurin hyödyntäminen asetti omat haasteensa, olisi algoritmin toteutus hyötynyt suuremmasta rinnakkaisuuden määrästä. Nykyinen rinnakkaisuuden määrä tulee olemaan haaste erityisesti tulevaisuudessa, sillä antennikonfiguraatioiden koon odotetaan kasvavan

University of Oulu Repository - Jultika