7,482 research outputs found
Analytical High-level Power model for LUT-based Components
This paper presents an extended high-level model for logic power estimation of multipliers and adders implemented in FPGAs in the presence of glitching and correlation. The model is based on an analytical computation of the switching activity produced in the component and the FPGA implementation details of the component structure. It is extended to consider operands of different word-lengths, both zero-mean and non- zero mean signals, and the glitching produced inside the component, taking into account the sign nature of the autocorrelation coefficients of the components’ inputs. The number of simulations needed for the model characterization is extremely small and can be reduced to only two. As the final power model is analytical, it is capable of providing power estimates in miliseconds. The results show that the mean relative error is within 10% of low-level power estimates given by the XPower tool
AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture
CPU-FPGA heterogeneous architectures are attracting ever-increasing attention
in an attempt to advance computational capabilities and energy efficiency in
today's datacenters. These architectures provide programmers with the ability
to reprogram the FPGAs for flexible acceleration of many workloads.
Nonetheless, this advantage is often overshadowed by the poor programmability
of FPGAs whose programming is conventionally a RTL design practice. Although
recent advances in high-level synthesis (HLS) significantly improve the FPGA
programmability, it still leaves programmers facing the challenge of
identifying the optimal design configuration in a tremendous design space.
This paper aims to address this challenge and pave the path from software
programs towards high-quality FPGA accelerators. Specifically, we first propose
the composable, parallel and pipeline (CPP) microarchitecture as a template of
accelerator designs. Such a well-defined template is able to support efficient
accelerator designs for a broad class of computation kernels, and more
importantly, drastically reduce the design space. Also, we introduce an
analytical model to capture the performance and resource trade-offs among
different design configurations of the CPP microarchitecture, which lays the
foundation for fast design space exploration. On top of the CPP
microarchitecture and its analytical model, we develop the AutoAccel framework
to make the entire accelerator generation automated. AutoAccel accepts a
software program as an input and performs a series of code transformations
based on the result of the analytical-model-based design space exploration to
construct the desired CPP microarchitecture. Our experiments show that the
AutoAccel-generated accelerators outperform their corresponding software
implementations by an average of 72x for a broad class of computation kernels
Semi-empirical model of MOST and passive devices focused on narrowband RF blocks
This paper presents a semi-empirical modeling of MOST and passive elements to be used in narrowband
radiofrequency blocks for nanometer technologies. This model is based on a small set of look-up
tables (LUTs) obtained via electrical simulations. The MOST description is valid for all-inversion regions
of MOST and the data is extracted as function of the gm=ID characteristic; for the passive devices the
LUTs include a simplified model of the element and its principal parasitic at the working frequency
f0. These semi-empirical models are validated by designing a set of 2.4-GHz LNAs and 2.4-GHz and
5-GHz VCOs in three different MOST inversion regions
A fast empirical method for galaxy shape measurements in weak lensing surveys
We describe a simple and fast method to correct ellipticity measurements of
galaxies from the distortion by the instrumental and atmospheric point spread
function (PSF), in view of weak lensing shear measurements. The method performs
a classification of galaxies and associated PSFs according to measured shape
parameters, and corrects the measured galaxy ellipticites by querying a large
lookup table (LUT), built by supervised learning. We have applied this new
method to the GREAT10 image analysis challenge, and present in this paper a
refined solution that obtains the competitive quality factor of Q = 104,
without any shear power spectrum denoising or training. Of particular interest
is the efficiency of the method, with a processing time below 3 ms per galaxy
on an ordinary CPU.Comment: 8 pages, 6 figures. Metric values updated according to the final
GREAT10 analysis software (Kitching et al. 2012, MNRAS 423, 3163-3208), no
qualitative changes. Associated code available at
http://lastro.epfl.ch/megalu
Efficiency analysis methodology of FPGAs based on lost frequencies, area and cycles
We propose a methodology to study and to quantify efficiency and the impact of overheads on runtime performance. Most work on High-Performance Computing (HPC) for FPGAs only studies runtime performance or cost, while we are interested in how far we are from peak performance and, more importantly, why. The efficiency of runtime performance is defined with respect to the ideal computational runtime in absence of inefficiencies. The analysis of the difference between actual and ideal runtime reveals the overheads and bottlenecks. A formal approach is proposed to decompose the efficiency into three components: frequency, area and cycles. After quantification of the efficiencies, a detailed analysis has to reveal the reasons for the lost frequencies, lost area and lost cycles. We propose a taxonomy of possible causes and practical methods to identify and quantify the overheads. The proposed methodology is applied on a number of use cases to illustrate the methodology. We show the interaction between the three components of efficiency and show how bottlenecks are revealed
Spatial decomposition of on-nucleus spectra of quasar host galaxies
In order to study the host galaxies of type 1 (broad-line) quasars, we
present a semi-analytic modelling method to decompose the on-nucleus spectra of
quasars into nuclear and host galaxy channels. The method uses the spatial
information contained in long-slit or slitlet spectra. A routine determines the
best fitting combination of the spatial distribution of the point like nucleus
and extended host galaxy. Inputs are a simultaneously observed PSF, and
external constraints on galaxy morphology from imaging. We demonstrate the
capabilities of the method to two samples of a total of 18 quasars observed
with EFOSC at the ESO 3.6m telescope and FORS1 at the ESO VLT.
~50% of the host galaxies with sucessful decomposition show distortions in
their rotation curves or peculiar gas velocities above normal maximum
velocities for disks. This is consistent with the fraction from optical
imaging. All host galaxies have quite young stellar populations, typically 1-2
Gyr. For the disk dominated hosts these are consistent with their inactive
counterparts, the luminosity weighted stellar ages are much younger for the
bulge dominated hosts, compared to inactive early type galaxies. While this
presents further evidence for a connection of galaxy interaction and AGN
activity for half of the sample, this is not clear for the other half: These
are often undistorted disk dominated host galaxies, and interaction on a
smaller level might be detected in deeper high-resolution images or deeper
spectroscopic data. The velocity information does not show obvious signs for
large scale outflows triggered by AGN feedback - the data is consistent with
velocity fields created by galaxy interaction.Comment: Accepted for publication in MNRAS; 19 pages, 12 figure
Optimizing Scrubbing by Netlist Analysis for FPGA Configuration Bit Classification and Floorplanning
Existing scrubbing techniques for SEU mitigation on FPGAs do not guarantee an
error-free operation after SEU recovering if the affected configuration bits do
belong to feedback loops of the implemented circuits. In this paper, we a)
provide a netlist-based circuit analysis technique to distinguish so-called
critical configuration bits from essential bits in order to identify
configuration bits which will need also state-restoring actions after a
recovered SEU and which not. Furthermore, b) an alternative classification
approach using fault injection is developed in order to compare both
classification techniques. Moreover, c) we will propose a floorplanning
approach for reducing the effective number of scrubbed frames and d),
experimental results will give evidence that our optimization methodology not
only allows to detect errors earlier but also to minimize the
Mean-Time-To-Repair (MTTR) of a circuit considerably. In particular, we show
that by using our approach, the MTTR for datapath-intensive circuits can be
reduced by up to 48.5% in comparison to standard approaches
Next-to-Next-to-Leading Electroweak Logarithms for W-Pair Production at LHC
We derive the high energy asymptotic of one- and two-loop corrections in the
next-to-next-to-leading logarithmic approximation to the differential cross
section of -pair production at the LHC. For large invariant mass of the
W-pair the (negative) one-loop terms can reach more than 40%, which are
partially compensated by the (positive) two-loop terms of up to 10%.Comment: 23 pages, 9 figures, added explanations in section 3, corrected typos
and figures 7, 8,
- …