15,638 research outputs found
Square-rich fixed point polynomial evaluation on FPGAs
Polynomial evaluation is important across a wide range of application domains, so significant work has been done on accelerating its computation. The conventional algorithm, referred to as Horner's rule, involves the least number of steps but can lead to increased latency due to serial computation. Parallel evaluation algorithms such as Estrin's method have shorter latency than Horner's rule, but achieve this at the expense of large hardware overhead. This paper presents an efficient polynomial evaluation algorithm, which reforms the evaluation process to include an increased number of squaring steps. By using a squarer design that is more efficient than general multiplication, this can result in polynomial evaluation with a 57.9% latency reduction over Horner's rule and 14.6% over Estrin's method, while consuming less area than Horner's rule, when implemented on a Xilinx Virtex 6 FPGA. When applied in fixed point function evaluation, where precision requirements limit the rounding of operands, it still achieves a 52.4% performance gain compared to Horner's rule with only a 4% area overhead in evaluating 5th degree polynomials
Complexity Analysis of Reed-Solomon Decoding over GF(2^m) Without Using Syndromes
For the majority of the applications of Reed-Solomon (RS) codes, hard
decision decoding is based on syndromes. Recently, there has been renewed
interest in decoding RS codes without using syndromes. In this paper, we
investigate the complexity of syndromeless decoding for RS codes, and compare
it to that of syndrome-based decoding. Aiming to provide guidelines to
practical applications, our complexity analysis differs in several aspects from
existing asymptotic complexity analysis, which is typically based on
multiplicative fast Fourier transform (FFT) techniques and is usually in big O
notation. First, we focus on RS codes over characteristic-2 fields, over which
some multiplicative FFT techniques are not applicable. Secondly, due to
moderate block lengths of RS codes in practice, our analysis is complete since
all terms in the complexities are accounted for. Finally, in addition to fast
implementation using additive FFT techniques, we also consider direct
implementation, which is still relevant for RS codes with moderate lengths.
Comparing the complexities of both syndromeless and syndrome-based decoding
algorithms based on direct and fast implementations, we show that syndromeless
decoding algorithms have higher complexities than syndrome-based ones for high
rate RS codes regardless of the implementation. Both errors-only and
errors-and-erasures decoding are considered in this paper. We also derive
tighter bounds on the complexities of fast polynomial multiplications based on
Cantor's approach and the fast extended Euclidean algorithm.Comment: 11 pages, submitted to EURASIP Journal on Wireless Communications and
Networkin
Efficient Explicit Time Stepping of High Order Discontinuous Galerkin Schemes for Waves
This work presents algorithms for the efficient implementation of
discontinuous Galerkin methods with explicit time stepping for acoustic wave
propagation on unstructured meshes of quadrilaterals or hexahedra. A crucial
step towards efficiency is to evaluate operators in a matrix-free way with
sum-factorization kernels. The method allows for general curved geometries and
variable coefficients. Temporal discretization is carried out by low-storage
explicit Runge-Kutta schemes and the arbitrary derivative (ADER) method. For
ADER, we propose a flexible basis change approach that combines cheap face
integrals with cell evaluation using collocated nodes and quadrature points.
Additionally, a degree reduction for the optimized cell evaluation is presented
to decrease the computational cost when evaluating higher order spatial
derivatives as required in ADER time stepping. We analyze and compare the
performance of state-of-the-art Runge-Kutta schemes and ADER time stepping with
the proposed optimizations. ADER involves fewer operations and additionally
reaches higher throughput by higher arithmetic intensities and hence decreases
the required computational time significantly. Comparison of Runge-Kutta and
ADER at their respective CFL stability limit renders ADER especially beneficial
for higher orders when the Butcher barrier implies an overproportional amount
of stages. Moreover, vector updates in explicit Runge--Kutta schemes are shown
to take a substantial amount of the computational time due to their memory
intensity
Fast and Accurate Bilateral Filtering using Gauss-Polynomial Decomposition
The bilateral filter is a versatile non-linear filter that has found diverse
applications in image processing, computer vision, computer graphics, and
computational photography. A widely-used form of the filter is the Gaussian
bilateral filter in which both the spatial and range kernels are Gaussian. A
direct implementation of this filter requires operations per
pixel, where is the standard deviation of the spatial Gaussian. In
this paper, we propose an accurate approximation algorithm that can cut down
the computational complexity to per pixel for any arbitrary
(constant-time implementation). This is based on the observation that the range
kernel operates via the translations of a fixed Gaussian over the range space,
and that these translated Gaussians can be accurately approximated using the
so-called Gauss-polynomials. The overall algorithm emerging from this
approximation involves a series of spatial Gaussian filtering, which can be
implemented in constant-time using separability and recursion. We present some
preliminary results to demonstrate that the proposed algorithm compares
favorably with some of the existing fast algorithms in terms of speed and
accuracy.Comment: To appear in the IEEE International Conference on Image Processing
(ICIP 2015
O(1) Computation of Legendre polynomials and Gauss-Legendre nodes and weights for parallel computing
A self-contained set of algorithms is proposed for the fast evaluation of Legendre polynomials of arbitrary degree and argument is an element of [-1, 1]. More specifically the time required to evaluate any Legendre polynomial, regardless of argument and degree, is bounded by a constant; i.e., the complexity is O(1). The proposed algorithm also immediately yields an O(1) algorithm for computing an arbitrary Gauss-Legendre quadrature node. Such a capability is crucial for efficiently performing certain parallel computations with high order Legendre polynomials, such as computing an integral in parallel by means of Gauss-Legendre quadrature and the parallel evaluation of Legendre series. In order to achieve the O(1) complexity, novel efficient asymptotic expansions are derived and used alongside known results. A C++ implementation is available from the authors that includes the evaluation routines of the Legendre polynomials and Gauss-Legendre quadrature rules
Computing Real Roots of Real Polynomials ... and now For Real!
Very recent work introduces an asymptotically fast subdivision algorithm,
denoted ANewDsc, for isolating the real roots of a univariate real polynomial.
The method combines Descartes' Rule of Signs to test intervals for the
existence of roots, Newton iteration to speed up convergence against clusters
of roots, and approximate computation to decrease the required precision. It
achieves record bounds on the worst-case complexity for the considered problem,
matching the complexity of Pan's method for computing all complex roots and
improving upon the complexity of other subdivision methods by several
magnitudes.
In the article at hand, we report on an implementation of ANewDsc on top of
the RS root isolator. RS is a highly efficient realization of the classical
Descartes method and currently serves as the default real root solver in Maple.
We describe crucial design changes within ANewDsc and RS that led to a
high-performance implementation without harming the theoretical complexity of
the underlying algorithm.
With an excerpt of our extensive collection of benchmarks, available online
at http://anewdsc.mpi-inf.mpg.de/, we illustrate that the theoretical gain in
performance of ANewDsc over other subdivision methods also transfers into
practice. These experiments also show that our new implementation outperforms
both RS and mature competitors by magnitudes for notoriously hard instances
with clustered roots. For all other instances, we avoid almost any overhead by
integrating additional optimizations and heuristics.Comment: Accepted for presentation at the 41st International Symposium on
Symbolic and Algebraic Computation (ISSAC), July 19--22, 2016, Waterloo,
Ontario, Canad
A matrix-free high-order discontinuous Galerkin compressible Navier-Stokes solver: A performance comparison of compressible and incompressible formulations for turbulent incompressible flows
Both compressible and incompressible Navier-Stokes solvers can be used and
are used to solve incompressible turbulent flow problems. In the compressible
case, the Mach number is then considered as a solver parameter that is set to a
small value, , in order to mimic incompressible flows.
This strategy is widely used for high-order discontinuous Galerkin
discretizations of the compressible Navier-Stokes equations. The present work
raises the question regarding the computational efficiency of compressible DG
solvers as compared to a genuinely incompressible formulation. Our
contributions to the state-of-the-art are twofold: Firstly, we present a
high-performance discontinuous Galerkin solver for the compressible
Navier-Stokes equations based on a highly efficient matrix-free implementation
that targets modern cache-based multicore architectures. The performance
results presented in this work focus on the node-level performance and our
results suggest that there is great potential for further performance
improvements for current state-of-the-art discontinuous Galerkin
implementations of the compressible Navier-Stokes equations. Secondly, this
compressible Navier-Stokes solver is put into perspective by comparing it to an
incompressible DG solver that uses the same matrix-free implementation. We
discuss algorithmic differences between both solution strategies and present an
in-depth numerical investigation of the performance. The considered benchmark
test cases are the three-dimensional Taylor-Green vortex problem as a
representative of transitional flows and the turbulent channel flow problem as
a representative of wall-bounded turbulent flows
Recommended from our members
Comparison of Current Gravity Estimation and Determination Models
This paper will discuss the history of gravity estimation and determination models while analyzing methods that are in development. Some fundamental methods for calculating the gravity field include spherical harmonics solutions, local weighted interpolation, and global point mascon modeling (PMC). Recently, high accuracy measurements have become more accessible, and the requirements for high order geopotential modeling have become more stringent. Interest in irregular bodies, accurate models of the hydrological system, and on-board processing has demanded a comprehensive model that can quickly and accurately compute the geopotential with low memory costs. This trade study of current geopotential modeling techniques will reveal that each modeling technique has a unique use case. It is notable that the spherical harmonics model is relatively accurate but poses a cumbersome inversion problem. PMC and interpolation models, on the other hand, are computationally efficient, but require more research to become robust models with high levels of accuracy. Considerations of the trade study will suggest further research for the point mascon model. The PMC model should be improved through mascon refinement, direct solutions that stem from geodetic measurements, and further validation of the gravity gradient. Finally, the potential for each model to be implemented with parallel computation will be shown to lead to large improvements in computing time while reducing the memory cost for each technique.Aerospace Engineering and Engineering Mechanic
- …