Search CORE

23,193 research outputs found

Parallel sparse interpolation using small primes

Author: Khochtali Mohamed
Roche Daniel S.
Tian Xisen
Publication venue
Publication date: 12/06/2015
Field of study

To interpolate a supersparse polynomial with integer coefficients, two alternative approaches are the Prony-based "big prime" technique, which acts over a single large finite field, or the more recently-proposed "small primes" technique, which reduces the unknown sparse polynomial to many low-degree dense polynomials. While the latter technique has not yet reached the same theoretical efficiency as Prony-based methods, it has an obvious potential for parallelization. We present a heuristic "small primes" interpolation algorithm and report on a low-level C implementation using FLINT and MPI.Comment: Accepted to PASCO 201

arXiv.org e-Print Archive

Crossref

Newton's method in practice II: The iterated refinement Newton method and near-optimal complexity for finding all roots of some polynomials of very large degrees

Author: Randig Marvin
Schleicher Dierk
Stoll Robin
Publication venue
Publication date: 31/12/2017
Field of study

We present a practical implementation based on Newton's method to find all roots of several families of complex polynomials of degrees exceeding one billion (

10^9

) so that the observed complexity to find all roots is between

O(d\ln d)

and

O(d\ln^3 d)

(measuring complexity in terms of number of Newton iterations or computing time). All computations were performed successfully on standard desktop computers built between 2007 and 2012.Comment: 24 pages, 19 figures. Update in v2 incorporates progress on polynomials of even higher degrees (greater than 1 billion

arXiv.org e-Print Archive

O(1) Computation of Legendre polynomials and Gauss-Legendre nodes and weights for parallel computing

Author: Bogaert Ignace
Fostier Jan
Michiels Bart
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2012
Field of study

A self-contained set of algorithms is proposed for the fast evaluation of Legendre polynomials of arbitrary degree and argument is an element of [-1, 1]. More specifically the time required to evaluate any Legendre polynomial, regardless of argument and degree, is bounded by a constant; i.e., the complexity is O(1). The proposed algorithm also immediately yields an O(1) algorithm for computing an arbitrary Gauss-Legendre quadrature node. Such a capability is crucial for efficiently performing certain parallel computations with high order Legendre polynomials, such as computing an integral in parallel by means of Gauss-Legendre quadrature and the parallel evaluation of Legendre series. In order to achieve the O(1) complexity, novel efficient asymptotic expansions are derived and used alongside known results. A C++ implementation is available from the authors that includes the evaluation routines of the Legendre polynomials and Gauss-Legendre quadrature rules

Ghent University Academic Bibliography

Square-rich fixed point polynomial evaluation on FPGAs

Author: Fahmy Suhaib A.
McLoughlin Ian V.
Xu Simin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Polynomial evaluation is important across a wide range of application domains, so significant work has been done on accelerating its computation. The conventional algorithm, referred to as Horner's rule, involves the least number of steps but can lead to increased latency due to serial computation. Parallel evaluation algorithms such as Estrin's method have shorter latency than Horner's rule, but achieve this at the expense of large hardware overhead. This paper presents an efficient polynomial evaluation algorithm, which reforms the evaluation process to include an increased number of squaring steps. By using a squarer design that is more efficient than general multiplication, this can result in polynomial evaluation with a 57.9% latency reduction over Horner's rule and 14.6% over Estrin's method, while consuming less area than Horner's rule, when implemented on a Xilinx Virtex 6 FPGA. When applied in fixed point function evaluation, where precision requirements limit the rounding of operands, it still achieves a 52.4% performance gain compared to Horner's rule with only a 4% area overhead in evaluating 5th degree polynomials

Kent Academic Repository

How proofs are prepared at Camelot

Author: Freivalds R.
Gao S.
Nešetřil J.
Publication venue
Publication date: 01/01/2016
Field of study

We study a design framework for robust, independently verifiable, and workload-balanced distributed algorithms working on a common input. An algorithm based on the framework is essentially a distributed encoding procedure for a Reed--Solomon code, which enables (a) robustness against byzantine failures with intrinsic error-correction and identification of failed nodes, and (b) independent randomized verification to check the entire computation for correctness, which takes essentially no more resources than each node individually contributes to the computation. The framework builds on recent Merlin--Arthur proofs of batch evaluation of Williams~[{\em Electron.\ Colloq.\ Comput.\ Complexity}, Report TR16-002, January 2016] with the observation that {\em Merlin's magic is not needed} for batch evaluation---mere Knights can prepare the proof, in parallel, and with intrinsic error-correction. The contribution of this paper is to show that in many cases the verifiable batch evaluation framework admits algorithms that match in total resource consumption the best known sequential algorithm for solving the problem. As our main result, we show that the

k

-cliques in an

n

-vertex graph can be counted {\em and} verified in per-node

O(n^{(\omega+\epsilon)k/6})

time and space on

O(n^{(\omega+\epsilon)k/6})

compute nodes, for any constant

\epsilon>0

and positive integer

k

divisible by

6

, where

2\leq\omega<2.3728639

is the exponent of matrix multiplication. This matches in total running time the best known sequential algorithm, due to Ne{\v{s}}et{\v{r}}il and Poljak [{\em Comment.~Math.~Univ.~Carolin.}~26 (1985) 415--419], and considerably improves its space usage and parallelizability. Further results include novel algorithms for counting triangles in sparse graphs, computing the chromatic polynomial of a graph, and computing the Tutte polynomial of a graph.Comment: 42 p

arXiv.org e-Print Archive

Lund University Publications

Crossref

Polynomial Size Analysis of First-Order Shapely Functions

Author: Albert
Amadio
Amadio
Aspinall
Barendsen
Benzinger
Chui
Girard
Hofmann
Jay
Lorenz
Marko van Eekelen
Matiyasevich
Milner
Olha Shkaravska
Ron van Kesteren
Shkaravska
Simona Ronchi Della Rocca
van Eekelen
Publication venue: 'Logical Methods in Computer Science e.V.'
Publication date: 01/01/2009
Field of study

We present a size-aware type system for first-order shapely function definitions. Here, a function definition is called shapely when the size of the result is determined exactly by a polynomial in the sizes of the arguments. Examples of shapely function definitions may be implementations of matrix multiplication and the Cartesian product of two lists. The type system is proved to be sound w.r.t. the operational semantics of the language. The type checking problem is shown to be undecidable in general. We define a natural syntactic restriction such that the type checking becomes decidable, even though size polynomials are not necessarily linear or monotonic. Furthermore, we have shown that the type-inference problem is at least semi-decidable (under this restriction). We have implemented a procedure that combines run-time testing and type-checking to automatically obtain size dependencies. It terminates on total typable function definitions.Comment: 35 pages, 1 figur

arXiv.org e-Print Archive

Efficient Explicit Time Stepping of High Order Discontinuous Galerkin Schemes for Waves

Author: Kormann Katharina
Kronbichler Martin
Schoeder Svenja
Wall Wolfgang
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2018
Field of study

This work presents algorithms for the efficient implementation of discontinuous Galerkin methods with explicit time stepping for acoustic wave propagation on unstructured meshes of quadrilaterals or hexahedra. A crucial step towards efficiency is to evaluate operators in a matrix-free way with sum-factorization kernels. The method allows for general curved geometries and variable coefficients. Temporal discretization is carried out by low-storage explicit Runge-Kutta schemes and the arbitrary derivative (ADER) method. For ADER, we propose a flexible basis change approach that combines cheap face integrals with cell evaluation using collocated nodes and quadrature points. Additionally, a degree reduction for the optimized cell evaluation is presented to decrease the computational cost when evaluating higher order spatial derivatives as required in ADER time stepping. We analyze and compare the performance of state-of-the-art Runge-Kutta schemes and ADER time stepping with the proposed optimizations. ADER involves fewer operations and additionally reaches higher throughput by higher arithmetic intensities and hence decreases the required computational time significantly. Comparison of Runge-Kutta and ADER at their respective CFL stability limit renders ADER especially beneficial for higher orders when the Butcher barrier implies an overproportional amount of stages. Moreover, vector updates in explicit Runge--Kutta schemes are shown to take a substantial amount of the computational time due to their memory intensity

arXiv.org e-Print Archive

MPG.PuRe