1,627 research outputs found
Distributed data cache designs for clustered VLIW processors
Wire delays are a major concern for current and forthcoming processors. One approach to deal with this problem is to divide the processor into semi-independent units referred to as clusters. A cluster usually consists of a local register file and a subset of the functional units, while the L1 data cache typically remains centralized in What we call partially distributed architectures. However, as technology evolves, the relative latency of such a centralized cache will increase, leading to an important impact on performance. In this paper, we propose partitioning the L1 data cache among clusters for clustered VLIW processors. We refer to this kind of design as fully distributed processors. In particular; we propose and evaluate three different configurations: a snoop-based cache coherence scheme, a word-interleaved cache, and flexible LO-buffers managed by the compiler. For each alternative, instruction scheduling techniques targeted to cyclic code are developed. Results for the Mediabench suite'show that the performance of such fully distributed architectures is always better than the performance of a partially distributed one with the same amount of resources. In addition, the key aspects of each fully distributed configuration are explored.Peer ReviewedPostprint (published version
Adaptive OFDM System Design For Cognitive Radio
Recently, Cognitive Radio has been proposed as a promising technology to improve spectrum utilization. A highly flexible OFDM system is considered to be a good candidate for the Cognitive Radio baseband processing where individual carriers can be switched off for frequencies occupied by a licensed user. In order to support such an adaptive OFDM system, we propose a Multiprocessor System-on-Chip (MPSoC) architecture which can be dynamically reconfigured. However, the complexity and flexibility of the baseband processing makes the MPSoC design a difficult task. This paper presents a design technology for mapping flexible OFDM baseband for Cognitive Radio on a multiprocessor System-on-Chip (MPSoC)
NanoMagnet Logic: an Architectural Viewpoint
Among the possible implementation of Field- Coupled devices NanoMagnet Logic is attractive for its low power consumption and the possibility to combine memory and logic in the same device. However, the nature of these technologies is so different from CMOS transistors that the implications on the circuit architecture must be taken carefully into account. In this work we analyze the most important issues related to the design of complex circuits using this technology. We discuss how they influence the architectural level. We propose detailed solutions to solve these problems and to improve the overall performance. As a result of this analysis the type of circuits and applications that constitute the best target for this technology are identified. The analysis is performed on NanoMagnet Logic but the results can be applied to any QCA technolog
Transformations of High-Level Synthesis Codes for High-Performance Computing
Specialized hardware architectures promise a major step in performance and
energy efficiency over the traditional load/store devices currently employed in
large scale computing systems. The adoption of high-level synthesis (HLS) from
languages such as C/C++ and OpenCL has greatly increased programmer
productivity when designing for such platforms. While this has enabled a wider
audience to target specialized hardware, the optimization principles known from
traditional software design are no longer sufficient to implement
high-performance codes. Fast and efficient codes for reconfigurable platforms
are thus still challenging to design. To alleviate this, we present a set of
optimizing transformations for HLS, targeting scalable and efficient
architectures for high-performance computing (HPC) applications. Our work
provides a toolbox for developers, where we systematically identify classes of
transformations, the characteristics of their effect on the HLS code and the
resulting hardware (e.g., increases data reuse or resource consumption), and
the objectives that each transformation can target (e.g., resolve interface
contention, or increase parallelism). We show how these can be used to
efficiently exploit pipelining, on-chip distributed fast memory, and on-chip
streaming dataflow, allowing for massively parallel architectures. To quantify
the effect of our transformations, we use them to optimize a set of
throughput-oriented FPGA kernels, demonstrating that our enhancements are
sufficient to scale up parallelism within the hardware constraints. With the
transformations covered, we hope to establish a common framework for
performance engineers, compiler developers, and hardware developers, to tap
into the performance potential offered by specialized hardware architectures
using HLS
Flexible compiler-managed L0 buffers for clustered VLIW processors
Wire delays are a major concern for current and forthcoming processors. One approach to attack this problem is to divide the processor into semi-independent units referred to as clusters. A cluster usually consists of a local register file and a subset of the functional units, while the data cache remains centralized. However, as technology evolves, the latency of such a centralized cache increase leading to an important performance impact. In this paper, we propose to include flexible low-latency buffers in each cluster in order to reduce the performance impact of higher cache latencies. The reduced number of entries in each buffer permits the design of flexible ways to map data from L1 to these buffers. The proposed L0 buffers are managed by the compiler, which is responsible to decide which memory instructions make us of them. Effective instruction scheduling techniques are proposed to generate code that exploits these buffers. Results for the Mediabench benchmark suite show that the performance of a clustered VLIW processor with a unified L1 data cache is improved by 16% when such buffers are used. In addition, the proposed architecture also shows significant advantages over both MultiVLIW processors and clustered processors with a word-interleaved cache, two state-of-the-art designs with a distributed L1 data cache.Peer ReviewedPostprint (published version
Cognitive Radio for Emergency Networks
In the scope of the Adaptive Ad-hoc Freeband (AAF) project, an emergency network built on top of Cognitive Radio is proposed to alleviate the spectrum shortage problem which is the major limitation for emergency networks. Cognitive
Radio has been proposed as a promising technology to solve
todayâ?~B??~D?s spectrum scarcity problem by allowing a secondary user in the non-used parts of the spectrum that aactully are assigned to primary services. Cognitive Radio has to work in different frequency bands and various wireless channels and supports multimedia services. A heterogenous reconfigurable System-on-Chip (SoC) architecture is proposed to enable the evolution from the traditional software defined radio to Cognitive Radio
Real-time digital signal processing for new wavelength-to-the-user optical access networks
Nowadays, optical access networks provide high capacity to end users with growing availability of multimedia contents that can be streamed to fixed or mobile devices. In this regard, one of the most flexible and low-cost approaches is Passive Optical Network (PON) that is used in Fiber-to-the-Home (FTTH). Due to the growing of the bandwidth demands, Wavelength Division Multiplexing (WDM), and later on ultra-dense WDM (udWDM) PON, with a narrow channel spacing, to increase the number of users through a single fiber, has been deployed.
The udWDM-PON with coherent technology is an attractive solution for the next generation optical access networks with advanced digital signal processing (DSP). Thanks to the higher sensitivity and improved channel selectivity in coherent detection with efficient DSP, optical networks support larger number of users in longer distances.
Since the cost is the main concern in the optical access networks, this thesis presents DSP architectures in coherent receiver (Rx), based on low-cost direct phase modulated commercial DFB lasers. The proposals are completely in agreement with consept of wavelength-to-the-user, where each client in optical network is dedicated to an individual wavelength.
Next, in a 6.25 GHz spaced udWDM grid with the optimized DSP techniques and phase-shift-keying (PSK) modulation format, the high sensitivity is achieved in real-time field-programmable-gate-array (FPGA) implementations.
Moreover, this thesis reduces hardware complexity of optical carrier recovery (CR) with two various strategies. First, based on differential mth-power frequency estimator (FE) by using look-up-tables (LUTs) and second, LUT-free CR architecture, with optimizing the power consumption and hardware resources, as well as improving the channel selectivity in terms of speed and robustness.
Furthermore, by designing very simple but efficient clock recovery, a symbol-rate DSP architecture, which process data using only one sample per symbol (1-sps), for polarization diversity (POD) structure, becomes possible. It makes the DSP independent from state-of-polarization (SOP), even in the case of low-cost optical front-end and low-speed analog-to-digital converters (ADCs), keeps the performance high as well as sensitivity in real-time implementations on FPGA.Avui en dia, les xarxes d'accés òptic proporcionen una alta capacitat als usuaris finals amb una creixent disponibilitat de continguts multimèdia que es poden transmetre a dispositius fixos o mòbils. En aquest sentit, un dels enfocaments més flexibles i de baix cost és la Xarxa Òptica Passiva (PON) que s'utilitza a Fibra-fins-la-Llar (FTTH). A causa del creixent requeriment de l'ample de banda, s'ha desplegat la multiplexació de divisió d'ona (WDM) i, posteriorment, el PON amb WDM d'alta densitat (udWDM), amb un espaiat estret de canals, per augmentar el nombre d'usuaris a través d'una sola fibra. L'udWDM-PON amb tecnologia coherent és una solució atractiva per a les xarxes d'accés òptic d'última generació amb processament avançat de senyal digital (DSP). Gràcies a la major sensibilitat i a la selectivitat millorada del canal en la detecció coherent amb DSP eficient, les xarxes òptiques suporten un nombre més gran d'usuaris a distàncies més llargues. Atès que el cost és la principal preocupació en les xarxes d'accés òptic, aquesta tesi presenta arquitectures DSP en receptor coherent (Rx), basades en làsers DFB comercials modulats en fase directa de baix cost. Les propostes estan d'acord amb la asignació de la longitud d'ona a l'usuari, on a cada client de la xarxa òptica se li dedica a una longitud d'ona individual. A continuació, en una graella udWDM espaciada de 6,25 GHz amb les tècniques de DSP optimitzades i el format de modulació de fase (PSK), s'aconsegueix l'alta sensibilitat en implementacions field-programable-gate-array (FPGA) en temps real. A més, aquesta tesi redueix la complexitat del maquinari de recuperació òptica de portadors (CR) amb dues estratègies diverses. Primer, basat en un estimador de freqüència de potència diferencial (FE) mitjançant l'ús de taules de cerca (LUTs) i, en segon lloc, l'arquitectura CR sense LUT, amb l'optimització del consum d'energia i els recursos de maquinari, a més de millorar la selectivitat del canal en termes de velocitat i robustesa. A més, al dissenyar una recuperació de rellotge molt simple, però eficaç, es fa possible una arquitectura DSP a la velocitat dels símbols, que processa dades utilitzant només una mostra per símbol (1-sps) per a l'estructura de la diversitat de polarització òptica (POD). Fa que el DSP sigui independent de l'estat de polarització (SOP), fins i tot en el cas dels analog-to-digital converters (ADC) de front-end òptics de baix cost, i manté el rendiment alt i la sensibilitat en les implementacions en temps real de FPGA.Postprint (published version
Real-time digital signal processing for new wavelength-to-the-user optical access networks
Nowadays, optical access networks provide high capacity to end users with growing availability of multimedia contents that can be streamed to fixed or mobile devices. In this regard, one of the most flexible and low-cost approaches is Passive Optical Network (PON) that is used in Fiber-to-the-Home (FTTH). Due to the growing of the bandwidth demands, Wavelength Division Multiplexing (WDM), and later on ultra-dense WDM (udWDM) PON, with a narrow channel spacing, to increase the number of users through a single fiber, has been deployed.
The udWDM-PON with coherent technology is an attractive solution for the next generation optical access networks with advanced digital signal processing (DSP). Thanks to the higher sensitivity and improved channel selectivity in coherent detection with efficient DSP, optical networks support larger number of users in longer distances.
Since the cost is the main concern in the optical access networks, this thesis presents DSP architectures in coherent receiver (Rx), based on low-cost direct phase modulated commercial DFB lasers. The proposals are completely in agreement with consept of wavelength-to-the-user, where each client in optical network is dedicated to an individual wavelength.
Next, in a 6.25 GHz spaced udWDM grid with the optimized DSP techniques and phase-shift-keying (PSK) modulation format, the high sensitivity is achieved in real-time field-programmable-gate-array (FPGA) implementations.
Moreover, this thesis reduces hardware complexity of optical carrier recovery (CR) with two various strategies. First, based on differential mth-power frequency estimator (FE) by using look-up-tables (LUTs) and second, LUT-free CR architecture, with optimizing the power consumption and hardware resources, as well as improving the channel selectivity in terms of speed and robustness.
Furthermore, by designing very simple but efficient clock recovery, a symbol-rate DSP architecture, which process data using only one sample per symbol (1-sps), for polarization diversity (POD) structure, becomes possible. It makes the DSP independent from state-of-polarization (SOP), even in the case of low-cost optical front-end and low-speed analog-to-digital converters (ADCs), keeps the performance high as well as sensitivity in real-time implementations on FPGA.Avui en dia, les xarxes d'accés òptic proporcionen una alta capacitat als usuaris finals amb una creixent disponibilitat de continguts multimèdia que es poden transmetre a dispositius fixos o mòbils. En aquest sentit, un dels enfocaments més flexibles i de baix cost és la Xarxa Òptica Passiva (PON) que s'utilitza a Fibra-fins-la-Llar (FTTH). A causa del creixent requeriment de l'ample de banda, s'ha desplegat la multiplexació de divisió d'ona (WDM) i, posteriorment, el PON amb WDM d'alta densitat (udWDM), amb un espaiat estret de canals, per augmentar el nombre d'usuaris a través d'una sola fibra. L'udWDM-PON amb tecnologia coherent és una solució atractiva per a les xarxes d'accés òptic d'última generació amb processament avançat de senyal digital (DSP). Gràcies a la major sensibilitat i a la selectivitat millorada del canal en la detecció coherent amb DSP eficient, les xarxes òptiques suporten un nombre més gran d'usuaris a distàncies més llargues. Atès que el cost és la principal preocupació en les xarxes d'accés òptic, aquesta tesi presenta arquitectures DSP en receptor coherent (Rx), basades en làsers DFB comercials modulats en fase directa de baix cost. Les propostes estan d'acord amb la asignació de la longitud d'ona a l'usuari, on a cada client de la xarxa òptica se li dedica a una longitud d'ona individual. A continuació, en una graella udWDM espaciada de 6,25 GHz amb les tècniques de DSP optimitzades i el format de modulació de fase (PSK), s'aconsegueix l'alta sensibilitat en implementacions field-programable-gate-array (FPGA) en temps real. A més, aquesta tesi redueix la complexitat del maquinari de recuperació òptica de portadors (CR) amb dues estratègies diverses. Primer, basat en un estimador de freqüència de potència diferencial (FE) mitjançant l'ús de taules de cerca (LUTs) i, en segon lloc, l'arquitectura CR sense LUT, amb l'optimització del consum d'energia i els recursos de maquinari, a més de millorar la selectivitat del canal en termes de velocitat i robustesa. A més, al dissenyar una recuperació de rellotge molt simple, però eficaç, es fa possible una arquitectura DSP a la velocitat dels símbols, que processa dades utilitzant només una mostra per símbol (1-sps) per a l'estructura de la diversitat de polarització òptica (POD). Fa que el DSP sigui independent de l'estat de polarització (SOP), fins i tot en el cas dels analog-to-digital converters (ADC) de front-end òptics de baix cost, i manté el rendiment alt i la sensibilitat en les implementacions en temps real de FPGA
A Design Methodology for Space-Time Adapter
This paper presents a solution to efficiently explore the design space of
communication adapters. In most digital signal processing (DSP) applications,
the overall architecture of the system is significantly affected by
communication architecture, so the designers need specifically optimized
adapters. By explicitly modeling these communications within an effective
graph-theoretic model and analysis framework, we automatically generate an
optimized architecture, named Space-Time AdapteR (STAR). Our design flow inputs
a C description of Input/Output data scheduling, and user requirements
(throughput, latency, parallelism...), and formalizes communication constraints
through a Resource Constraints Graph (RCG). The RCG properties enable an
efficient architecture space exploration in order to synthesize a STAR
component. The proposed approach has been tested to design an industrial data
mixing block example: an Ultra-Wideband interleaver.Comment: ISBN : 978-1-59593-606-
- …