7,482 research outputs found

    Analytical High-level Power model for LUT-based Components

    Full text link
    This paper presents an extended high-level model for logic power estimation of multipliers and adders implemented in FPGAs in the presence of glitching and correlation. The model is based on an analytical computation of the switching activity produced in the component and the FPGA implementation details of the component structure. It is extended to consider operands of different word-lengths, both zero-mean and non- zero mean signals, and the glitching produced inside the component, taking into account the sign nature of the autocorrelation coefficients of the components’ inputs. The number of simulations needed for the model characterization is extremely small and can be reduced to only two. As the final power model is analytical, it is capable of providing power estimates in miliseconds. The results show that the mean relative error is within 10% of low-level power estimates given by the XPower tool

    AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture

    Full text link
    CPU-FPGA heterogeneous architectures are attracting ever-increasing attention in an attempt to advance computational capabilities and energy efficiency in today's datacenters. These architectures provide programmers with the ability to reprogram the FPGAs for flexible acceleration of many workloads. Nonetheless, this advantage is often overshadowed by the poor programmability of FPGAs whose programming is conventionally a RTL design practice. Although recent advances in high-level synthesis (HLS) significantly improve the FPGA programmability, it still leaves programmers facing the challenge of identifying the optimal design configuration in a tremendous design space. This paper aims to address this challenge and pave the path from software programs towards high-quality FPGA accelerators. Specifically, we first propose the composable, parallel and pipeline (CPP) microarchitecture as a template of accelerator designs. Such a well-defined template is able to support efficient accelerator designs for a broad class of computation kernels, and more importantly, drastically reduce the design space. Also, we introduce an analytical model to capture the performance and resource trade-offs among different design configurations of the CPP microarchitecture, which lays the foundation for fast design space exploration. On top of the CPP microarchitecture and its analytical model, we develop the AutoAccel framework to make the entire accelerator generation automated. AutoAccel accepts a software program as an input and performs a series of code transformations based on the result of the analytical-model-based design space exploration to construct the desired CPP microarchitecture. Our experiments show that the AutoAccel-generated accelerators outperform their corresponding software implementations by an average of 72x for a broad class of computation kernels

    Semi-empirical model of MOST and passive devices focused on narrowband RF blocks

    Get PDF
    This paper presents a semi-empirical modeling of MOST and passive elements to be used in narrowband radiofrequency blocks for nanometer technologies. This model is based on a small set of look-up tables (LUTs) obtained via electrical simulations. The MOST description is valid for all-inversion regions of MOST and the data is extracted as function of the gm=ID characteristic; for the passive devices the LUTs include a simplified model of the element and its principal parasitic at the working frequency f0. These semi-empirical models are validated by designing a set of 2.4-GHz LNAs and 2.4-GHz and 5-GHz VCOs in three different MOST inversion regions

    A fast empirical method for galaxy shape measurements in weak lensing surveys

    Full text link
    We describe a simple and fast method to correct ellipticity measurements of galaxies from the distortion by the instrumental and atmospheric point spread function (PSF), in view of weak lensing shear measurements. The method performs a classification of galaxies and associated PSFs according to measured shape parameters, and corrects the measured galaxy ellipticites by querying a large lookup table (LUT), built by supervised learning. We have applied this new method to the GREAT10 image analysis challenge, and present in this paper a refined solution that obtains the competitive quality factor of Q = 104, without any shear power spectrum denoising or training. Of particular interest is the efficiency of the method, with a processing time below 3 ms per galaxy on an ordinary CPU.Comment: 8 pages, 6 figures. Metric values updated according to the final GREAT10 analysis software (Kitching et al. 2012, MNRAS 423, 3163-3208), no qualitative changes. Associated code available at http://lastro.epfl.ch/megalu

    Efficiency analysis methodology of FPGAs based on lost frequencies, area and cycles

    Get PDF
    We propose a methodology to study and to quantify efficiency and the impact of overheads on runtime performance. Most work on High-Performance Computing (HPC) for FPGAs only studies runtime performance or cost, while we are interested in how far we are from peak performance and, more importantly, why. The efficiency of runtime performance is defined with respect to the ideal computational runtime in absence of inefficiencies. The analysis of the difference between actual and ideal runtime reveals the overheads and bottlenecks. A formal approach is proposed to decompose the efficiency into three components: frequency, area and cycles. After quantification of the efficiencies, a detailed analysis has to reveal the reasons for the lost frequencies, lost area and lost cycles. We propose a taxonomy of possible causes and practical methods to identify and quantify the overheads. The proposed methodology is applied on a number of use cases to illustrate the methodology. We show the interaction between the three components of efficiency and show how bottlenecks are revealed

    Spatial decomposition of on-nucleus spectra of quasar host galaxies

    Get PDF
    In order to study the host galaxies of type 1 (broad-line) quasars, we present a semi-analytic modelling method to decompose the on-nucleus spectra of quasars into nuclear and host galaxy channels. The method uses the spatial information contained in long-slit or slitlet spectra. A routine determines the best fitting combination of the spatial distribution of the point like nucleus and extended host galaxy. Inputs are a simultaneously observed PSF, and external constraints on galaxy morphology from imaging. We demonstrate the capabilities of the method to two samples of a total of 18 quasars observed with EFOSC at the ESO 3.6m telescope and FORS1 at the ESO VLT. ~50% of the host galaxies with sucessful decomposition show distortions in their rotation curves or peculiar gas velocities above normal maximum velocities for disks. This is consistent with the fraction from optical imaging. All host galaxies have quite young stellar populations, typically 1-2 Gyr. For the disk dominated hosts these are consistent with their inactive counterparts, the luminosity weighted stellar ages are much younger for the bulge dominated hosts, compared to inactive early type galaxies. While this presents further evidence for a connection of galaxy interaction and AGN activity for half of the sample, this is not clear for the other half: These are often undistorted disk dominated host galaxies, and interaction on a smaller level might be detected in deeper high-resolution images or deeper spectroscopic data. The velocity information does not show obvious signs for large scale outflows triggered by AGN feedback - the data is consistent with velocity fields created by galaxy interaction.Comment: Accepted for publication in MNRAS; 19 pages, 12 figure

    Optimizing Scrubbing by Netlist Analysis for FPGA Configuration Bit Classification and Floorplanning

    Full text link
    Existing scrubbing techniques for SEU mitigation on FPGAs do not guarantee an error-free operation after SEU recovering if the affected configuration bits do belong to feedback loops of the implemented circuits. In this paper, we a) provide a netlist-based circuit analysis technique to distinguish so-called critical configuration bits from essential bits in order to identify configuration bits which will need also state-restoring actions after a recovered SEU and which not. Furthermore, b) an alternative classification approach using fault injection is developed in order to compare both classification techniques. Moreover, c) we will propose a floorplanning approach for reducing the effective number of scrubbed frames and d), experimental results will give evidence that our optimization methodology not only allows to detect errors earlier but also to minimize the Mean-Time-To-Repair (MTTR) of a circuit considerably. In particular, we show that by using our approach, the MTTR for datapath-intensive circuits can be reduced by up to 48.5% in comparison to standard approaches

    Next-to-Next-to-Leading Electroweak Logarithms for W-Pair Production at LHC

    Get PDF
    We derive the high energy asymptotic of one- and two-loop corrections in the next-to-next-to-leading logarithmic approximation to the differential cross section of WW-pair production at the LHC. For large invariant mass of the W-pair the (negative) one-loop terms can reach more than 40%, which are partially compensated by the (positive) two-loop terms of up to 10%.Comment: 23 pages, 9 figures, added explanations in section 3, corrected typos and figures 7, 8,
    corecore