461 research outputs found

    Coarse-grained reconfigurable array architectures

    Get PDF
    Coarse-Grained ReconïŹgurable Array (CGRA) architectures accelerate the same inner loops that beneïŹt from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efïŹciently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on ïŹ‚exibility, performance, and power-efïŹciency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual ïŹne-tuning of source code

    A case for merging the ILP and DLP paradigms

    Get PDF
    The goal of this paper is to show that instruction level parallelism (ILP) and data-level parallelism (DLP) can be merged in a single architecture to execute vectorizable code at a performance level that can not be achieved using either paradigm on its own. We will show that the combination of the two techniques yields very high performance at a low cost and a low complexity. We will show that this architecture can reach a performance equivalent to a superscalar processor that sustained 10 instructions per cycle. We will see that the machine exploiting both types of parallelism improves upon the ILP-only machine by factors of 1.5-1.8. We also present a study on the scalability of both paradigms and show that, when we increase resources to reach a 16-issue machine, the advantage of the ILP+DLP machine over the ILP-only machine increases up to 2.0-3.45. While the peak achieved IPC for the ILP machine is 4, the ILP+DLP machine exceeds 10 instructions per cycle.Peer ReviewedPostprint (published version

    Fast 2D-DCT implementations for VLIW processors

    Get PDF

    State of the art baseband DSP platforms for Software Defined Radio: A survey

    Get PDF
    Software Defined Radio (SDR) is an innovative approach which is becoming a more and more promising technology for future mobile handsets. Several proposals in the field of embedded systems have been introduced by different universities and industries to support SDR applications. This article presents an overview of current platforms and analyzes the related architectural choices, the current issues in SDR, as well as potential future trends.Peer reviewe

    Exploring Processor and Memory Architectures for Multimedia

    Get PDF
    Multimedia has become one of the cornerstones of our 21st century society and, when combined with mobility, has enabled a tremendous evolution of our society. However, joining these two concepts introduces many technical challenges. These range from having sufficient performance for handling multimedia content to having the battery stamina for acceptable mobile usage. When taking a projection of where we are heading, we see these issues becoming ever more challenging by increased mobility as well as advancements in multimedia content, such as introduction of stereoscopic 3D and augmented reality. The increased performance needs for handling multimedia come not only from an ongoing step-up in resolution going from QVGA (320x240) to Full HD (1920x1080) a 27x increase in less than half a decade. On top of this, there is also codec evolution (MPEG-2 to H.264 AVC) that adds to the computational load increase. To meet these performance challenges there has been processing and memory architecture advances (SIMD, out-of-order superscalarity, multicore processing and heterogeneous multilevel memories) in the mobile domain, in conjunction with ever increasing operating frequencies (200MHz to 2GHz) and on-chip memory sizes (128KB to 2-3MB). At the same time there is an increase in requirements for mobility, placing higher demands on battery-powered systems despite the steady increase in battery capacity (500 to 2000mAh). This leaves negative net result in-terms of battery capacity versus performance advances. In order to make optimal use of these architectural advances and to meet the power limitations in mobile systems, there is a need for taking an overall approach on how to best utilize these systems. The right trade-off between performance and power is crucial. On top of these constraints, the flexibility aspects of the system need to be addressed. All this makes it very important to reach the right architectural balance in the system. The first goal for this thesis is to examine multimedia applications and propose a flexible solution that can meet the architectural requirements in a mobile system. Secondly, propose an automated methodology of optimally mapping multimedia data and instructions to a heterogeneous multilevel memory subsystem. The proposed methodology uses constraint programming for solving a multidimensional optimization problem. Results from this work indicate that using today’s most advanced mobile processor technology together with a multi-level heterogeneous on-chip memory subsystem can meet the performance requirements for handling multimedia. By utilizing the automated optimal memory mapping method presented in this thesis lower total power consumption can be achieved, whilst performance for multimedia applications is improved, by employing enhanced memory management. This is achieved through reduced external accesses and better reuse of memory objects. This automatic method shows high accuracy, up to 90%, for predicting multimedia memory accesses for a given architecture

    Efficient implementation of channel estimation algorithm for beamforming

    Get PDF
    Abstract. The future 5G mobile network technology is expected to offer significantly better performance than its predecessors. Improved data rates in conjunction with low latency is believed to enable technological revolutions such as self-driving cars. To achieve faster data rates, MIMO systems can be utilized. These systems enable the use of spatial filtering technique known as beamforming. Beamforming that is based on the preacquired channel matrix is computationally very demanding causing challenges in achieving low latency. By acquiring the channel matrix as efficiently as possible, we can facilitate this challenge. In this thesis we examined the implementation of channel estimation algorithm for beamforming with a digital signal processor specialized in vector computation. We present implementations for different antenna configurations based on three different approaches. The results show that the best performance is achieved by applying the algorithm according to the limitations given by the system and the processor architecture. Although the exploitation of the parallel architecture was proved to be challenging, the implementation of the algorithm would have benefitted from the greater amount of parallelism. The current parallel resources will be a challenge especially in the future as the size of antenna configurations is expected to grow.Keilanmuodostuksen tarvitseman kanavaestimointialgoritmin tehokas toteutus. TiivistelmÀ. Tulevan viidennen sukupolven mobiiliverkkoteknologian odotetaan tarjoavan merkittÀvÀsti edeltÀjÀÀnsÀ parempaa suorituskykyÀ. TÀmÀn suorituskyvyn tarjoamat suuret datanopeudet yhdistettynÀ pieneen latenssiin uskotaan mahdollistavan esimerkiksi itsestÀÀn ajavat autot. Suurempien datanopeuksien saavuttamiseksi voidaan hyödyntÀÀ monitiekanavassa kÀytettÀvÀÀ MIMO-systeemiÀ, joka mahdollistaa keilanmuodostuksena tunnetun spatiaalisen suodatusmenetelmÀn kÀytön. EtukÀteen hankittuun kanavatilatietoon perustuva keilanmuodostus on laskennallisesti erittÀin kallista. TÀmÀ aiheuttaa haasteita verkon pienen latenssivaatimuksen saavuttamisessa. TÀssÀ työssÀ tutkittiin keilanmuodostukselle tarkoitetun kanavaestimointialgoritmin tehokasta toteutusta hyödyntÀen vektorilaskentaan erikoistunutta prosessoriarkkitehtuuria. TyössÀ esitellÀÀn kolmea eri lÀhestymistapaa hyödyntÀvÀt toteutukset eri kokoisille antennikonfiguraatioille. Tuloksista nÀhdÀÀn, ettÀ paras suorituskyky saavutetaan sovittamalla algoritmi jÀrjestelmÀn ja arkkitehtuurin asettamien rajoitusten mukaisesti. Vaikka rinnakkaisarkkitehtuurin hyödyntÀminen asetti omat haasteensa, olisi algoritmin toteutus hyötynyt suuremmasta rinnakkaisuuden mÀÀrÀstÀ. Nykyinen rinnakkaisuuden mÀÀrÀ tulee olemaan haaste erityisesti tulevaisuudessa, sillÀ antennikonfiguraatioiden koon odotetaan kasvavan
    • 

    corecore