3,078 research outputs found

    Implementation of a Combined OFDM-Demodulation and WCDMA-Equalization Module

    Get PDF
    For a dual-mode baseband receiver for the OFDMWireless LAN andWCDMA standards, integration of the demodulation and equalization tasks on a dedicated hardware module has been investigated. For OFDM demodulation, an FFT algorithm based on cascaded twiddle factor decomposition has been selected. This type of algorithm combines high spatial and temporal regularity in the FFT data-flow graphs with a minimal number of computations. A frequency-domain algorithm based on a circulant channel approximation has been selected for WCDMA equalization. It has good performance, low hardware complexity and a low number of computations. Its main advantage is the reuse of the FFT kernel, which contributes to the integration of both tasks. The demodulation and equalization module has been described at the register transfer level with the in-house developed Arx language. The core of the module is a pipelined radix-23 butterfly combined with a complex multiplier and complex divider. The module has an area of 0.447 mm2 in 0.18 ¿m technology and a power consumption of 10.6 mW. The proposed module compares favorably with solutions reported in literature

    Characterization and Acceleration of High Performance Compute Workloads

    Get PDF

    Characterization and Acceleration of High Performance Compute Workloads

    Get PDF

    Analysing Astronomy Algorithms for GPUs and Beyond

    Full text link
    Astronomy depends on ever increasing computing power. Processor clock-rates have plateaued, and increased performance is now appearing in the form of additional processor cores on a single chip. This poses significant challenges to the astronomy software community. Graphics Processing Units (GPUs), now capable of general-purpose computation, exemplify both the difficult learning-curve and the significant speedups exhibited by massively-parallel hardware architectures. We present a generalised approach to tackling this paradigm shift, based on the analysis of algorithms. We describe a small collection of foundation algorithms relevant to astronomy and explain how they may be used to ease the transition to massively-parallel computing architectures. We demonstrate the effectiveness of our approach by applying it to four well-known astronomy problems: Hogbom CLEAN, inverse ray-shooting for gravitational lensing, pulsar dedispersion and volume rendering. Algorithms with well-defined memory access patterns and high arithmetic intensity stand to receive the greatest performance boost from massively-parallel architectures, while those that involve a significant amount of decision-making may struggle to take advantage of the available processing power.Comment: 10 pages, 3 figures, accepted for publication in MNRA

    State of the art baseband DSP platforms for Software Defined Radio: A survey

    Get PDF
    Software Defined Radio (SDR) is an innovative approach which is becoming a more and more promising technology for future mobile handsets. Several proposals in the field of embedded systems have been introduced by different universities and industries to support SDR applications. This article presents an overview of current platforms and analyzes the related architectural choices, the current issues in SDR, as well as potential future trends.Peer reviewe

    BEEBS: Open Benchmarks for Energy Measurements on Embedded Platforms

    Full text link
    This paper presents and justifies an open benchmark suite named BEEBS, targeted at evaluating the energy consumption of embedded processors. We explore the possible sources of energy consumption, then select individual benchmarks from contemporary suites to cover these areas. Version one of BEEBS is presented here and contains 10 benchmarks that cover a wide range of typical embedded applications. The benchmark suite is portable across diverse architectures and is freely available. The benchmark suite is extensively evaluated, and the properties of its constituent programs are analysed. Using real hardware platforms we show case examples which illustrate the difference in power dissipation between three processor architectures and their related ISAs. We observe significant differences in the average instruction dissipation between the architectures of 4.4x, specifically 170uW/MHz (ARM Cortex-M0), 65uW/MHz (Adapteva Epiphany) and 88uW/MHz (XMOS XS1-L1)

    REAL-TIME ADAPTIVE PULSE COMPRESSION ON RECONFIGURABLE, SYSTEM-ON-CHIP (SOC) PLATFORMS

    Get PDF
    New radar applications need to perform complex algorithms and process a large quantity of data to generate useful information for the users. This situation has motivated the search for better processing solutions that include low-power high-performance processors, efficient algorithms, and high-speed interfaces. In this work, hardware implementation of adaptive pulse compression algorithms for real-time transceiver optimization is presented, and is based on a System-on-Chip architecture for reconfigurable hardware devices. This study also evaluates the performance of dedicated coprocessors as hardware accelerator units to speed up and improve the computation of computing-intensive tasks such matrix multiplication and matrix inversion, which are essential units to solve the covariance matrix. The tradeoffs between latency and hardware utilization are also presented. Moreover, the system architecture takes advantage of the embedded processor, which is interconnected with the logic resources through high-performance buses, to perform floating-point operations, control the processing blocks, and communicate with an external PC through a customized software interface. The overall system functionality is demonstrated and tested for real-time operations using a Ku-band testbed together with a low-cost channel emulator for different types of waveforms

    Mixed-length SIMD code generation for VLIW architectures with multiple native vector-widths

    Full text link
    The degree of DLP parallelism in applications is not fixed and varies due to different computational characteristics of applications. On the contrary, most of the processors today include single-width SIMD (vector) hardware to exploit DLP. However, single-width SIMD architectures may not be optimal to serve applications with varying DLP and they may cause performance and energy inefficiency. We propose the usage of VLIW processors with multiple native vector-widths to better serve applications with changing DLP. SHAVE is an example of such VLIW processor and provides hardware support for the native 32-bit and 128-bit wide vector operations. This paper researches and implements the mixed-length SIMD code generation support for SHAVE processor. More specifically, we target generating 32-bit and 128/64-bit SIMD code for the native 32-bit and 128-bit wide vector units of SHAVE processor. In this way, we improved the performance of compiler generated SIMD code by reducing the number of overhead operations and by increasing the SIMD hardware utilization. Experimental results demonstrated that our methodology implemented in the compiler improves the performance of synthetic benchmarks up to 47%
    corecore