Search CORE

15,638 research outputs found

Square-rich fixed point polynomial evaluation on FPGAs

Author: Fahmy Suhaib A.
McLoughlin Ian V.
Xu Simin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Polynomial evaluation is important across a wide range of application domains, so significant work has been done on accelerating its computation. The conventional algorithm, referred to as Horner's rule, involves the least number of steps but can lead to increased latency due to serial computation. Parallel evaluation algorithms such as Estrin's method have shorter latency than Horner's rule, but achieve this at the expense of large hardware overhead. This paper presents an efficient polynomial evaluation algorithm, which reforms the evaluation process to include an increased number of squaring steps. By using a squarer design that is more efficient than general multiplication, this can result in polynomial evaluation with a 57.9% latency reduction over Horner's rule and 14.6% over Estrin's method, while consuming less area than Horner's rule, when implemented on a Xilinx Virtex 6 FPGA. When applied in fixed point function evaluation, where precision requirements limit the rounding of operands, it still achieves a 52.4% performance gain compared to Horner's rule with only a 4% area overhead in evaluating 5th degree polynomials

Kent Academic Repository

Complexity Analysis of Reed-Solomon Decoding over GF(2^m) Without Using Syndromes

Author: Chen Ning
Yan Zhiyuan
Publication venue
Publication date: 01/01/2008
Field of study

For the majority of the applications of Reed-Solomon (RS) codes, hard decision decoding is based on syndromes. Recently, there has been renewed interest in decoding RS codes without using syndromes. In this paper, we investigate the complexity of syndromeless decoding for RS codes, and compare it to that of syndrome-based decoding. Aiming to provide guidelines to practical applications, our complexity analysis differs in several aspects from existing asymptotic complexity analysis, which is typically based on multiplicative fast Fourier transform (FFT) techniques and is usually in big O notation. First, we focus on RS codes over characteristic-2 fields, over which some multiplicative FFT techniques are not applicable. Secondly, due to moderate block lengths of RS codes in practice, our analysis is complete since all terms in the complexities are accounted for. Finally, in addition to fast implementation using additive FFT techniques, we also consider direct implementation, which is still relevant for RS codes with moderate lengths. Comparing the complexities of both syndromeless and syndrome-based decoding algorithms based on direct and fast implementations, we show that syndromeless decoding algorithms have higher complexities than syndrome-based ones for high rate RS codes regardless of the implementation. Both errors-only and errors-and-erasures decoding are considered in this paper. We also derive tighter bounds on the complexities of fast polynomial multiplications based on Cantor's approach and the fast extended Euclidean algorithm.Comment: 11 pages, submitted to EURASIP Journal on Wireless Communications and Networkin

arXiv.org e-Print Archive

CiteSeerX

Springer - Publisher Connector

Directory of Open Access Journals

Efficient Explicit Time Stepping of High Order Discontinuous Galerkin Schemes for Waves

Author: Kormann Katharina
Kronbichler Martin
Schoeder Svenja
Wall Wolfgang
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2018
Field of study

This work presents algorithms for the efficient implementation of discontinuous Galerkin methods with explicit time stepping for acoustic wave propagation on unstructured meshes of quadrilaterals or hexahedra. A crucial step towards efficiency is to evaluate operators in a matrix-free way with sum-factorization kernels. The method allows for general curved geometries and variable coefficients. Temporal discretization is carried out by low-storage explicit Runge-Kutta schemes and the arbitrary derivative (ADER) method. For ADER, we propose a flexible basis change approach that combines cheap face integrals with cell evaluation using collocated nodes and quadrature points. Additionally, a degree reduction for the optimized cell evaluation is presented to decrease the computational cost when evaluating higher order spatial derivatives as required in ADER time stepping. We analyze and compare the performance of state-of-the-art Runge-Kutta schemes and ADER time stepping with the proposed optimizations. ADER involves fewer operations and additionally reaches higher throughput by higher arithmetic intensities and hence decreases the required computational time significantly. Comparison of Runge-Kutta and ADER at their respective CFL stability limit renders ADER especially beneficial for higher orders when the Butcher barrier implies an overproportional amount of stages. Moreover, vector updates in explicit Runge--Kutta schemes are shown to take a substantial amount of the computational time due to their memory intensity

arXiv.org e-Print Archive

MPG.PuRe

Fast and Accurate Bilateral Filtering using Gauss-Polynomial Decomposition

Author: Chaudhury Kunal N.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/05/2015
Field of study

The bilateral filter is a versatile non-linear filter that has found diverse applications in image processing, computer vision, computer graphics, and computational photography. A widely-used form of the filter is the Gaussian bilateral filter in which both the spatial and range kernels are Gaussian. A direct implementation of this filter requires

O(\sigma^2)

operations per pixel, where

\sigma

is the standard deviation of the spatial Gaussian. In this paper, we propose an accurate approximation algorithm that can cut down the computational complexity to

O(1)

per pixel for any arbitrary

\sigma

(constant-time implementation). This is based on the observation that the range kernel operates via the translations of a fixed Gaussian over the range space, and that these translated Gaussians can be accurately approximated using the so-called Gauss-polynomials. The overall algorithm emerging from this approximation involves a series of spatial Gaussian filtering, which can be implemented in constant-time using separability and recursion. We present some preliminary results to demonstrate that the proposed algorithm compares favorably with some of the existing fast algorithms in terms of speed and accuracy.Comment: To appear in the IEEE International Conference on Image Processing (ICIP 2015

arXiv.org e-Print Archive

Crossref

Open Access Repository of IISc Research Publications

O(1) Computation of Legendre polynomials and Gauss-Legendre nodes and weights for parallel computing

Author: Bogaert Ignace
Fostier Jan
Michiels Bart
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2012
Field of study

A self-contained set of algorithms is proposed for the fast evaluation of Legendre polynomials of arbitrary degree and argument is an element of [-1, 1]. More specifically the time required to evaluate any Legendre polynomial, regardless of argument and degree, is bounded by a constant; i.e., the complexity is O(1). The proposed algorithm also immediately yields an O(1) algorithm for computing an arbitrary Gauss-Legendre quadrature node. Such a capability is crucial for efficiently performing certain parallel computations with high order Legendre polynomials, such as computing an integral in parallel by means of Gauss-Legendre quadrature and the parallel evaluation of Legendre series. In order to achieve the O(1) complexity, novel efficient asymptotic expansions are derived and used alongside known results. A C++ implementation is available from the authors that includes the evaluation routines of the Legendre polynomials and Gauss-Legendre quadrature rules

Ghent University Academic Bibliography

Computing Real Roots of Real Polynomials ... and now For Real!

Author: Kobel Alexander
Rouillier Fabrice
Sagraloff Michael
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Very recent work introduces an asymptotically fast subdivision algorithm, denoted ANewDsc, for isolating the real roots of a univariate real polynomial. The method combines Descartes' Rule of Signs to test intervals for the existence of roots, Newton iteration to speed up convergence against clusters of roots, and approximate computation to decrease the required precision. It achieves record bounds on the worst-case complexity for the considered problem, matching the complexity of Pan's method for computing all complex roots and improving upon the complexity of other subdivision methods by several magnitudes. In the article at hand, we report on an implementation of ANewDsc on top of the RS root isolator. RS is a highly efficient realization of the classical Descartes method and currently serves as the default real root solver in Maple. We describe crucial design changes within ANewDsc and RS that led to a high-performance implementation without harming the theoretical complexity of the underlying algorithm. With an excerpt of our extensive collection of benchmarks, available online at http://anewdsc.mpi-inf.mpg.de/, we illustrate that the theoretical gain in performance of ANewDsc over other subdivision methods also transfers into practice. These experiments also show that our new implementation outperforms both RS and mature competitors by magnitudes for notoriously hard instances with clustered roots. For all other instances, we avoid almost any overhead by integrating additional optimizations and heuristics.Comment: Accepted for presentation at the 41st International Symposium on Symbolic and Algebraic Computation (ISSAC), July 19--22, 2016, Waterloo, Ontario, Canad

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

MPG.PuRe

A matrix-free high-order discontinuous Galerkin compressible Navier-Stokes solver: A performance comparison of compressible and incompressible formulations for turbulent incompressible flows

Author: Arndt
Arnold
Bassi
Beck
Beck
Brown
Cantwell
Carton de Wiart
Carton de Wiart
Chapelier
Del Alamo
Fehn
Fehn
Fehn
Fernandez
Fischer
Flad
Flad
Franciolini
Gassner
Guermond
Hager
Hartmann
Hesthaven
Hillewaert
Hindenlang
Joshi
Karniadakis
Kennedy
Kirby
Kopriva
Krank
Krank
Kronbichler
Kronbichler
Kronbichler
Kronbichler
Kubatko
Mengaldo
Moser
Moura
Orszag
Pazner
Steinmoeller
Taylor
Toulorge
Uranga
Vos
Wang
Publication venue: 'Wiley'
Publication date: 01/01/2018
Field of study

Both compressible and incompressible Navier-Stokes solvers can be used and are used to solve incompressible turbulent flow problems. In the compressible case, the Mach number is then considered as a solver parameter that is set to a small value,

\mathrm{M}\approx 0.1

, in order to mimic incompressible flows. This strategy is widely used for high-order discontinuous Galerkin discretizations of the compressible Navier-Stokes equations. The present work raises the question regarding the computational efficiency of compressible DG solvers as compared to a genuinely incompressible formulation. Our contributions to the state-of-the-art are twofold: Firstly, we present a high-performance discontinuous Galerkin solver for the compressible Navier-Stokes equations based on a highly efficient matrix-free implementation that targets modern cache-based multicore architectures. The performance results presented in this work focus on the node-level performance and our results suggest that there is great potential for further performance improvements for current state-of-the-art discontinuous Galerkin implementations of the compressible Navier-Stokes equations. Secondly, this compressible Navier-Stokes solver is put into perspective by comparing it to an incompressible DG solver that uses the same matrix-free implementation. We discuss algorithmic differences between both solution strategies and present an in-depth numerical investigation of the performance. The considered benchmark test cases are the three-dimensional Taylor-Green vortex problem as a representative of transitional flows and the turbulent channel flow problem as a representative of wall-bounded turbulent flows

arXiv.org e-Print Archive

OPUS Augsburg

Crossref

Recommended from our members

Comparison of Current Gravity Estimation and Determination Models

Author: Hillman Kyle
Publication venue
Publication date: 01/05/2018
Field of study

This paper will discuss the history of gravity estimation and determination models while analyzing methods that are in development. Some fundamental methods for calculating the gravity field include spherical harmonics solutions, local weighted interpolation, and global point mascon modeling (PMC). Recently, high accuracy measurements have become more accessible, and the requirements for high order geopotential modeling have become more stringent. Interest in irregular bodies, accurate models of the hydrological system, and on-board processing has demanded a comprehensive model that can quickly and accurately compute the geopotential with low memory costs. This trade study of current geopotential modeling techniques will reveal that each modeling technique has a unique use case. It is notable that the spherical harmonics model is relatively accurate but poses a cumbersome inversion problem. PMC and interpolation models, on the other hand, are computationally efficient, but require more research to become robust models with high levels of accuracy. Considerations of the trade study will suggest further research for the point mascon model. The PMC model should be improved through mascon refinement, direct solutions that stem from geodetic measurements, and further validation of the gravity gradient. Finally, the potential for each model to be implemented with parallel computation will be shown to lead to large improvements in computing time while reducing the memory cost for each technique.Aerospace Engineering and Engineering Mechanic

Texas ScholarWorks