2,843 research outputs found

    Fast Power and Energy Efficiency Analysis of FPGA-based Wireless Base-band Processing

    Full text link
    Nowadays, demands for high performance keep on increasing in the wireless communication domain. This leads to a consistent rise of the complexity and designing such systems has become a challenging task. In this context, energy efficiency is considered as a key topic, especially for embedded systems in which design space is often very constrained. In this paper, a fast and accurate power estimation approach for FPGA-based hardware systems is applied to a typical wireless communication system. It aims at providing power estimates of complete systems prior to their implementations. This is made possible by using a dedicated library of high-level models that are representative of hardware IPs. Based on high-level simulations, design space exploration is made a lot faster and easier. The definition of a scenario and the monitoring of IP's time-activities facilitate the comparison of several domain-specific systems. The proposed approach and its benefits are demonstrated through a typical use case in the wireless communication domain.Comment: Presented at HIP3ES, 201

    AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture

    Full text link
    CPU-FPGA heterogeneous architectures are attracting ever-increasing attention in an attempt to advance computational capabilities and energy efficiency in today's datacenters. These architectures provide programmers with the ability to reprogram the FPGAs for flexible acceleration of many workloads. Nonetheless, this advantage is often overshadowed by the poor programmability of FPGAs whose programming is conventionally a RTL design practice. Although recent advances in high-level synthesis (HLS) significantly improve the FPGA programmability, it still leaves programmers facing the challenge of identifying the optimal design configuration in a tremendous design space. This paper aims to address this challenge and pave the path from software programs towards high-quality FPGA accelerators. Specifically, we first propose the composable, parallel and pipeline (CPP) microarchitecture as a template of accelerator designs. Such a well-defined template is able to support efficient accelerator designs for a broad class of computation kernels, and more importantly, drastically reduce the design space. Also, we introduce an analytical model to capture the performance and resource trade-offs among different design configurations of the CPP microarchitecture, which lays the foundation for fast design space exploration. On top of the CPP microarchitecture and its analytical model, we develop the AutoAccel framework to make the entire accelerator generation automated. AutoAccel accepts a software program as an input and performs a series of code transformations based on the result of the analytical-model-based design space exploration to construct the desired CPP microarchitecture. Our experiments show that the AutoAccel-generated accelerators outperform their corresponding software implementations by an average of 72x for a broad class of computation kernels

    Implementation of JPEG compression and motion estimation on FPGA hardware

    Full text link
    A hardware implementation of JPEG allows for real-time compression in data intensivve applications, such as high speed scanning, medical imaging and satellite image transmission. Implementation options include dedicated DSP or media processors, FPGA boards, and ASICs. Factors that affect the choice of platform selection involve cost, speed, memory, size, power consumption, and case of reconfiguration. The proposed hardware solution is based on a Very high speed integrated circuit Hardware Description Language (VHDL) implememtation of the codec with prefered realization using an FPGA board due to speed, cost and flexibility factors; The VHDL language is commonly used to model hardware impletations from a top down perspective. The VHDL code may be simulated to correct mistakes and subsequently synthesized into hardware using a synthesis tool, such as the xilinx ise suite. The same VHDL code may be synthesized into a number of sifferent hardware architetcures based on constraints given. For example speed was the major constraint when synthesizing the pipeline of jpeg encoding and decoding, while chip area and power consumption were primary constraints when synthesizing the on-die memory because of large area. Thus, there is a trade off between area and speed in logic synthesis

    Computer Architectures to Close the Loop in Real-time Optimization

    Get PDF
    © 2015 IEEE.Many modern control, automation, signal processing and machine learning applications rely on solving a sequence of optimization problems, which are updated with measurements of a real system that evolves in time. The solutions of each of these optimization problems are then used to make decisions, which may be followed by changing some parameters of the physical system, thereby resulting in a feedback loop between the computing and the physical system. Real-time optimization is not the same as fast optimization, due to the fact that the computation is affected by an uncertain system that evolves in time. The suitability of a design should therefore not be judged from the optimality of a single optimization problem, but based on the evolution of the entire cyber-physical system. The algorithms and hardware used for solving a single optimization problem in the office might therefore be far from ideal when solving a sequence of real-time optimization problems. Instead of there being a single, optimal design, one has to trade-off a number of objectives, including performance, robustness, energy usage, size and cost. We therefore provide here a tutorial introduction to some of the questions and implementation issues that arise in real-time optimization applications. We will concentrate on some of the decisions that have to be made when designing the computing architecture and algorithm and argue that the choice of one informs the other

    Digital Complex Correlator for a C-band Polarimetry survey

    Full text link
    The international Galactic Emission Mapping project aims to map and characterize the polarization field of the Milky Way. In Portugal it will cartograph the C-band sky polarized emission of the Northern Hemisphere and provide templates for map calibration and foreground control of microwave space probes like ESA Planck Surveyor mission. The receiver system is equipped with a novel receiver with a full digital back-end using an Altera Field Programmable Gate Array, having a very favorable cost/performance relation. This new digital backend comprises a base-band complex cross-correlator outputting the four Stokes parameters of the incoming polarized radiation. In this document we describe the design and implementation of the complex correlator using COTS components and a processing FPGA, detailing the method applied in the several algorithm stages and suitable for large sky area surveys.Comment: 15 pages, 10 figures; submitted to Experimental Astronomy, Springe

    A software controlled voltage tuning system using multi-purpose ring oscillators

    Full text link
    This paper presents a novel software driven voltage tuning method that utilises multi-purpose Ring Oscillators (ROs) to provide process variation and environment sensitive energy reductions. The proposed technique enables voltage tuning based on the observed frequency of the ROs, taken as a representation of the device speed and used to estimate a safe minimum operating voltage at a given core frequency. A conservative linear relationship between RO frequency and silicon speed is used to approximate the critical path of the processor. Using a multi-purpose RO not specifically implemented for critical path characterisation is a unique approach to voltage tuning. The parameters governing the relationship between RO and silicon speed are obtained through the testing of a sample of processors from different wafer regions. These parameters can then be used on all devices of that model. The tuning method and software control framework is demonstrated on a sample of XMOS XS1-U8A-64 embedded microprocessors, yielding a dynamic power saving of up to 25% with no performance reduction and no negative impact on the real-time constraints of the embedded software running on the processor

    Real-Time Trigger and online Data Reduction based on Machine Learning Methods for Particle Detector Technology

    Get PDF
    Moderne Teilchenbeschleuniger-Experimente generieren während zur Laufzeit immense Datenmengen. Die gesamte erzeugte Datenmenge abzuspeichern, überschreitet hierbei schnell das verfügbare Budget für die Infrastruktur zur Datenauslese. Dieses Problem wird üblicherweise durch eine Kombination von Trigger- und Datenreduktionsmechanismen adressiert. Beide Mechanismen werden dabei so nahe wie möglich an den Detektoren platziert um die gewünschte Reduktion der ausgehenden Datenraten so frühzeitig wie möglich zu ermöglichen. In solchen Systeme traditionell genutzte Verfahren haben währenddessen ihre Mühe damit eine effiziente Reduktion in modernen Experimenten zu erzielen. Die Gründe dafür liegen zum Teil in den komplexen Verteilungen der auftretenden Untergrund Ereignissen. Diese Situation wird bei der Entwicklung der Detektorauslese durch die vorab unbekannten Eigenschaften des Beschleunigers und Detektors während des Betriebs unter hoher Luminosität verstärkt. Aus diesem Grund wird eine robuste und flexible algorithmische Alternative benötigt, welche von Verfahren aus dem maschinellen Lernen bereitgestellt werden kann. Da solche Trigger- und Datenreduktion-Systeme unter erschwerten Bedingungen wie engem Latenz-Budget, einer großen Anzahl zu nutzender Verbindungen zur Datenübertragung und allgemeinen Echtzeitanforderungen betrieben werden müssen, werden oft FPGAs als technologische Basis für die Umsetzung genutzt. Innerhalb dieser Arbeit wurden mehrere Ansätze auf Basis von FPGAs entwickelt und umgesetzt, welche die vorherrschenden Problemstellungen für das Belle II Experiment adressieren. Diese Ansätze werden über diese Arbeit hinweg vorgestellt und diskutiert werden
    corecore