1,627 research outputs found

    Distributed data cache designs for clustered VLIW processors

    Get PDF
    Wire delays are a major concern for current and forthcoming processors. One approach to deal with this problem is to divide the processor into semi-independent units referred to as clusters. A cluster usually consists of a local register file and a subset of the functional units, while the L1 data cache typically remains centralized in What we call partially distributed architectures. However, as technology evolves, the relative latency of such a centralized cache will increase, leading to an important impact on performance. In this paper, we propose partitioning the L1 data cache among clusters for clustered VLIW processors. We refer to this kind of design as fully distributed processors. In particular; we propose and evaluate three different configurations: a snoop-based cache coherence scheme, a word-interleaved cache, and flexible LO-buffers managed by the compiler. For each alternative, instruction scheduling techniques targeted to cyclic code are developed. Results for the Mediabench suite'show that the performance of such fully distributed architectures is always better than the performance of a partially distributed one with the same amount of resources. In addition, the key aspects of each fully distributed configuration are explored.Peer ReviewedPostprint (published version

    Adaptive OFDM System Design For Cognitive Radio

    Get PDF
    Recently, Cognitive Radio has been proposed as a promising technology to improve spectrum utilization. A highly flexible OFDM system is considered to be a good candidate for the Cognitive Radio baseband processing where individual carriers can be switched off for frequencies occupied by a licensed user. In order to support such an adaptive OFDM system, we propose a Multiprocessor System-on-Chip (MPSoC) architecture which can be dynamically reconfigured. However, the complexity and flexibility of the baseband processing makes the MPSoC design a difficult task. This paper presents a design technology for mapping flexible OFDM baseband for Cognitive Radio on a multiprocessor System-on-Chip (MPSoC)

    NanoMagnet Logic: an Architectural Viewpoint

    Get PDF
    Among the possible implementation of Field- Coupled devices NanoMagnet Logic is attractive for its low power consumption and the possibility to combine memory and logic in the same device. However, the nature of these technologies is so different from CMOS transistors that the implications on the circuit architecture must be taken carefully into account. In this work we analyze the most important issues related to the design of complex circuits using this technology. We discuss how they influence the architectural level. We propose detailed solutions to solve these problems and to improve the overall performance. As a result of this analysis the type of circuits and applications that constitute the best target for this technology are identified. The analysis is performed on NanoMagnet Logic but the results can be applied to any QCA technolog

    Transformations of High-Level Synthesis Codes for High-Performance Computing

    Full text link
    Specialized hardware architectures promise a major step in performance and energy efficiency over the traditional load/store devices currently employed in large scale computing systems. The adoption of high-level synthesis (HLS) from languages such as C/C++ and OpenCL has greatly increased programmer productivity when designing for such platforms. While this has enabled a wider audience to target specialized hardware, the optimization principles known from traditional software design are no longer sufficient to implement high-performance codes. Fast and efficient codes for reconfigurable platforms are thus still challenging to design. To alleviate this, we present a set of optimizing transformations for HLS, targeting scalable and efficient architectures for high-performance computing (HPC) applications. Our work provides a toolbox for developers, where we systematically identify classes of transformations, the characteristics of their effect on the HLS code and the resulting hardware (e.g., increases data reuse or resource consumption), and the objectives that each transformation can target (e.g., resolve interface contention, or increase parallelism). We show how these can be used to efficiently exploit pipelining, on-chip distributed fast memory, and on-chip streaming dataflow, allowing for massively parallel architectures. To quantify the effect of our transformations, we use them to optimize a set of throughput-oriented FPGA kernels, demonstrating that our enhancements are sufficient to scale up parallelism within the hardware constraints. With the transformations covered, we hope to establish a common framework for performance engineers, compiler developers, and hardware developers, to tap into the performance potential offered by specialized hardware architectures using HLS

    Flexible compiler-managed L0 buffers for clustered VLIW processors

    Get PDF
    Wire delays are a major concern for current and forthcoming processors. One approach to attack this problem is to divide the processor into semi-independent units referred to as clusters. A cluster usually consists of a local register file and a subset of the functional units, while the data cache remains centralized. However, as technology evolves, the latency of such a centralized cache increase leading to an important performance impact. In this paper, we propose to include flexible low-latency buffers in each cluster in order to reduce the performance impact of higher cache latencies. The reduced number of entries in each buffer permits the design of flexible ways to map data from L1 to these buffers. The proposed L0 buffers are managed by the compiler, which is responsible to decide which memory instructions make us of them. Effective instruction scheduling techniques are proposed to generate code that exploits these buffers. Results for the Mediabench benchmark suite show that the performance of a clustered VLIW processor with a unified L1 data cache is improved by 16% when such buffers are used. In addition, the proposed architecture also shows significant advantages over both MultiVLIW processors and clustered processors with a word-interleaved cache, two state-of-the-art designs with a distributed L1 data cache.Peer ReviewedPostprint (published version

    Cognitive Radio for Emergency Networks

    Get PDF
    In the scope of the Adaptive Ad-hoc Freeband (AAF) project, an emergency network built on top of Cognitive Radio is proposed to alleviate the spectrum shortage problem which is the major limitation for emergency networks. Cognitive Radio has been proposed as a promising technology to solve todayâ?~B??~D?s spectrum scarcity problem by allowing a secondary user in the non-used parts of the spectrum that aactully are assigned to primary services. Cognitive Radio has to work in different frequency bands and various wireless channels and supports multimedia services. A heterogenous reconfigurable System-on-Chip (SoC) architecture is proposed to enable the evolution from the traditional software defined radio to Cognitive Radio

    Real-time digital signal processing for new wavelength-to-the-user optical access networks

    Get PDF
    Nowadays, optical access networks provide high capacity to end users with growing availability of multimedia contents that can be streamed to fixed or mobile devices. In this regard, one of the most flexible and low-cost approaches is Passive Optical Network (PON) that is used in Fiber-to-the-Home (FTTH). Due to the growing of the bandwidth demands, Wavelength Division Multiplexing (WDM), and later on ultra-dense WDM (udWDM) PON, with a narrow channel spacing, to increase the number of users through a single fiber, has been deployed. The udWDM-PON with coherent technology is an attractive solution for the next generation optical access networks with advanced digital signal processing (DSP). Thanks to the higher sensitivity and improved channel selectivity in coherent detection with efficient DSP, optical networks support larger number of users in longer distances. Since the cost is the main concern in the optical access networks, this thesis presents DSP architectures in coherent receiver (Rx), based on low-cost direct phase modulated commercial DFB lasers. The proposals are completely in agreement with consept of wavelength-to-the-user, where each client in optical network is dedicated to an individual wavelength. Next, in a 6.25 GHz spaced udWDM grid with the optimized DSP techniques and phase-shift-keying (PSK) modulation format, the high sensitivity is achieved in real-time field-programmable-gate-array (FPGA) implementations. Moreover, this thesis reduces hardware complexity of optical carrier recovery (CR) with two various strategies. First, based on differential mth-power frequency estimator (FE) by using look-up-tables (LUTs) and second, LUT-free CR architecture, with optimizing the power consumption and hardware resources, as well as improving the channel selectivity in terms of speed and robustness. Furthermore, by designing very simple but efficient clock recovery, a symbol-rate DSP architecture, which process data using only one sample per symbol (1-sps), for polarization diversity (POD) structure, becomes possible. It makes the DSP independent from state-of-polarization (SOP), even in the case of low-cost optical front-end and low-speed analog-to-digital converters (ADCs), keeps the performance high as well as sensitivity in real-time implementations on FPGA.Avui en dia, les xarxes d'accés òptic proporcionen una alta capacitat als usuaris finals amb una creixent disponibilitat de continguts multimèdia que es poden transmetre a dispositius fixos o mòbils. En aquest sentit, un dels enfocaments més flexibles i de baix cost és la Xarxa Òptica Passiva (PON) que s'utilitza a Fibra-fins-la-Llar (FTTH). A causa del creixent requeriment de l'ample de banda, s'ha desplegat la multiplexació de divisió d'ona (WDM) i, posteriorment, el PON amb WDM d'alta densitat (udWDM), amb un espaiat estret de canals, per augmentar el nombre d'usuaris a través d'una sola fibra. L'udWDM-PON amb tecnologia coherent és una solució atractiva per a les xarxes d'accés òptic d'última generació amb processament avançat de senyal digital (DSP). Gràcies a la major sensibilitat i a la selectivitat millorada del canal en la detecció coherent amb DSP eficient, les xarxes òptiques suporten un nombre més gran d'usuaris a distàncies més llargues. Atès que el cost és la principal preocupació en les xarxes d'accés òptic, aquesta tesi presenta arquitectures DSP en receptor coherent (Rx), basades en làsers DFB comercials modulats en fase directa de baix cost. Les propostes estan d'acord amb la asignació de la longitud d'ona a l'usuari, on a cada client de la xarxa òptica se li dedica a una longitud d'ona individual. A continuació, en una graella udWDM espaciada de 6,25 GHz amb les tècniques de DSP optimitzades i el format de modulació de fase (PSK), s'aconsegueix l'alta sensibilitat en implementacions field-programable-gate-array (FPGA) en temps real. A més, aquesta tesi redueix la complexitat del maquinari de recuperació òptica de portadors (CR) amb dues estratègies diverses. Primer, basat en un estimador de freqüència de potència diferencial (FE) mitjançant l'ús de taules de cerca (LUTs) i, en segon lloc, l'arquitectura CR sense LUT, amb l'optimització del consum d'energia i els recursos de maquinari, a més de millorar la selectivitat del canal en termes de velocitat i robustesa. A més, al dissenyar una recuperació de rellotge molt simple, però eficaç, es fa possible una arquitectura DSP a la velocitat dels símbols, que processa dades utilitzant només una mostra per símbol (1-sps) per a l'estructura de la diversitat de polarització òptica (POD). Fa que el DSP sigui independent de l'estat de polarització (SOP), fins i tot en el cas dels analog-to-digital converters (ADC) de front-end òptics de baix cost, i manté el rendiment alt i la sensibilitat en les implementacions en temps real de FPGA.Postprint (published version

    Real-time digital signal processing for new wavelength-to-the-user optical access networks

    Get PDF
    Nowadays, optical access networks provide high capacity to end users with growing availability of multimedia contents that can be streamed to fixed or mobile devices. In this regard, one of the most flexible and low-cost approaches is Passive Optical Network (PON) that is used in Fiber-to-the-Home (FTTH). Due to the growing of the bandwidth demands, Wavelength Division Multiplexing (WDM), and later on ultra-dense WDM (udWDM) PON, with a narrow channel spacing, to increase the number of users through a single fiber, has been deployed. The udWDM-PON with coherent technology is an attractive solution for the next generation optical access networks with advanced digital signal processing (DSP). Thanks to the higher sensitivity and improved channel selectivity in coherent detection with efficient DSP, optical networks support larger number of users in longer distances. Since the cost is the main concern in the optical access networks, this thesis presents DSP architectures in coherent receiver (Rx), based on low-cost direct phase modulated commercial DFB lasers. The proposals are completely in agreement with consept of wavelength-to-the-user, where each client in optical network is dedicated to an individual wavelength. Next, in a 6.25 GHz spaced udWDM grid with the optimized DSP techniques and phase-shift-keying (PSK) modulation format, the high sensitivity is achieved in real-time field-programmable-gate-array (FPGA) implementations. Moreover, this thesis reduces hardware complexity of optical carrier recovery (CR) with two various strategies. First, based on differential mth-power frequency estimator (FE) by using look-up-tables (LUTs) and second, LUT-free CR architecture, with optimizing the power consumption and hardware resources, as well as improving the channel selectivity in terms of speed and robustness. Furthermore, by designing very simple but efficient clock recovery, a symbol-rate DSP architecture, which process data using only one sample per symbol (1-sps), for polarization diversity (POD) structure, becomes possible. It makes the DSP independent from state-of-polarization (SOP), even in the case of low-cost optical front-end and low-speed analog-to-digital converters (ADCs), keeps the performance high as well as sensitivity in real-time implementations on FPGA.Avui en dia, les xarxes d'accés òptic proporcionen una alta capacitat als usuaris finals amb una creixent disponibilitat de continguts multimèdia que es poden transmetre a dispositius fixos o mòbils. En aquest sentit, un dels enfocaments més flexibles i de baix cost és la Xarxa Òptica Passiva (PON) que s'utilitza a Fibra-fins-la-Llar (FTTH). A causa del creixent requeriment de l'ample de banda, s'ha desplegat la multiplexació de divisió d'ona (WDM) i, posteriorment, el PON amb WDM d'alta densitat (udWDM), amb un espaiat estret de canals, per augmentar el nombre d'usuaris a través d'una sola fibra. L'udWDM-PON amb tecnologia coherent és una solució atractiva per a les xarxes d'accés òptic d'última generació amb processament avançat de senyal digital (DSP). Gràcies a la major sensibilitat i a la selectivitat millorada del canal en la detecció coherent amb DSP eficient, les xarxes òptiques suporten un nombre més gran d'usuaris a distàncies més llargues. Atès que el cost és la principal preocupació en les xarxes d'accés òptic, aquesta tesi presenta arquitectures DSP en receptor coherent (Rx), basades en làsers DFB comercials modulats en fase directa de baix cost. Les propostes estan d'acord amb la asignació de la longitud d'ona a l'usuari, on a cada client de la xarxa òptica se li dedica a una longitud d'ona individual. A continuació, en una graella udWDM espaciada de 6,25 GHz amb les tècniques de DSP optimitzades i el format de modulació de fase (PSK), s'aconsegueix l'alta sensibilitat en implementacions field-programable-gate-array (FPGA) en temps real. A més, aquesta tesi redueix la complexitat del maquinari de recuperació òptica de portadors (CR) amb dues estratègies diverses. Primer, basat en un estimador de freqüència de potència diferencial (FE) mitjançant l'ús de taules de cerca (LUTs) i, en segon lloc, l'arquitectura CR sense LUT, amb l'optimització del consum d'energia i els recursos de maquinari, a més de millorar la selectivitat del canal en termes de velocitat i robustesa. A més, al dissenyar una recuperació de rellotge molt simple, però eficaç, es fa possible una arquitectura DSP a la velocitat dels símbols, que processa dades utilitzant només una mostra per símbol (1-sps) per a l'estructura de la diversitat de polarització òptica (POD). Fa que el DSP sigui independent de l'estat de polarització (SOP), fins i tot en el cas dels analog-to-digital converters (ADC) de front-end òptics de baix cost, i manté el rendiment alt i la sensibilitat en les implementacions en temps real de FPGA

    A Design Methodology for Space-Time Adapter

    Full text link
    This paper presents a solution to efficiently explore the design space of communication adapters. In most digital signal processing (DSP) applications, the overall architecture of the system is significantly affected by communication architecture, so the designers need specifically optimized adapters. By explicitly modeling these communications within an effective graph-theoretic model and analysis framework, we automatically generate an optimized architecture, named Space-Time AdapteR (STAR). Our design flow inputs a C description of Input/Output data scheduling, and user requirements (throughput, latency, parallelism...), and formalizes communication constraints through a Resource Constraints Graph (RCG). The RCG properties enable an efficient architecture space exploration in order to synthesize a STAR component. The proposed approach has been tested to design an industrial data mixing block example: an Ultra-Wideband interleaver.Comment: ISBN : 978-1-59593-606-
    corecore