23,193 research outputs found

    Parallel sparse interpolation using small primes

    Full text link
    To interpolate a supersparse polynomial with integer coefficients, two alternative approaches are the Prony-based "big prime" technique, which acts over a single large finite field, or the more recently-proposed "small primes" technique, which reduces the unknown sparse polynomial to many low-degree dense polynomials. While the latter technique has not yet reached the same theoretical efficiency as Prony-based methods, it has an obvious potential for parallelization. We present a heuristic "small primes" interpolation algorithm and report on a low-level C implementation using FLINT and MPI.Comment: Accepted to PASCO 201

    Newton's method in practice II: The iterated refinement Newton method and near-optimal complexity for finding all roots of some polynomials of very large degrees

    Full text link
    We present a practical implementation based on Newton's method to find all roots of several families of complex polynomials of degrees exceeding one billion (10910^9) so that the observed complexity to find all roots is between O(dlnd)O(d\ln d) and O(dln3d)O(d\ln^3 d) (measuring complexity in terms of number of Newton iterations or computing time). All computations were performed successfully on standard desktop computers built between 2007 and 2012.Comment: 24 pages, 19 figures. Update in v2 incorporates progress on polynomials of even higher degrees (greater than 1 billion

    O(1) Computation of Legendre polynomials and Gauss-Legendre nodes and weights for parallel computing

    Get PDF
    A self-contained set of algorithms is proposed for the fast evaluation of Legendre polynomials of arbitrary degree and argument is an element of [-1, 1]. More specifically the time required to evaluate any Legendre polynomial, regardless of argument and degree, is bounded by a constant; i.e., the complexity is O(1). The proposed algorithm also immediately yields an O(1) algorithm for computing an arbitrary Gauss-Legendre quadrature node. Such a capability is crucial for efficiently performing certain parallel computations with high order Legendre polynomials, such as computing an integral in parallel by means of Gauss-Legendre quadrature and the parallel evaluation of Legendre series. In order to achieve the O(1) complexity, novel efficient asymptotic expansions are derived and used alongside known results. A C++ implementation is available from the authors that includes the evaluation routines of the Legendre polynomials and Gauss-Legendre quadrature rules

    Square-rich fixed point polynomial evaluation on FPGAs

    Get PDF
    Polynomial evaluation is important across a wide range of application domains, so significant work has been done on accelerating its computation. The conventional algorithm, referred to as Horner's rule, involves the least number of steps but can lead to increased latency due to serial computation. Parallel evaluation algorithms such as Estrin's method have shorter latency than Horner's rule, but achieve this at the expense of large hardware overhead. This paper presents an efficient polynomial evaluation algorithm, which reforms the evaluation process to include an increased number of squaring steps. By using a squarer design that is more efficient than general multiplication, this can result in polynomial evaluation with a 57.9% latency reduction over Horner's rule and 14.6% over Estrin's method, while consuming less area than Horner's rule, when implemented on a Xilinx Virtex 6 FPGA. When applied in fixed point function evaluation, where precision requirements limit the rounding of operands, it still achieves a 52.4% performance gain compared to Horner's rule with only a 4% area overhead in evaluating 5th degree polynomials

    How proofs are prepared at Camelot

    Full text link
    We study a design framework for robust, independently verifiable, and workload-balanced distributed algorithms working on a common input. An algorithm based on the framework is essentially a distributed encoding procedure for a Reed--Solomon code, which enables (a) robustness against byzantine failures with intrinsic error-correction and identification of failed nodes, and (b) independent randomized verification to check the entire computation for correctness, which takes essentially no more resources than each node individually contributes to the computation. The framework builds on recent Merlin--Arthur proofs of batch evaluation of Williams~[{\em Electron.\ Colloq.\ Comput.\ Complexity}, Report TR16-002, January 2016] with the observation that {\em Merlin's magic is not needed} for batch evaluation---mere Knights can prepare the proof, in parallel, and with intrinsic error-correction. The contribution of this paper is to show that in many cases the verifiable batch evaluation framework admits algorithms that match in total resource consumption the best known sequential algorithm for solving the problem. As our main result, we show that the kk-cliques in an nn-vertex graph can be counted {\em and} verified in per-node O(n(ω+ϵ)k/6)O(n^{(\omega+\epsilon)k/6}) time and space on O(n(ω+ϵ)k/6)O(n^{(\omega+\epsilon)k/6}) compute nodes, for any constant ϵ>0\epsilon>0 and positive integer kk divisible by 66, where 2ω<2.37286392\leq\omega<2.3728639 is the exponent of matrix multiplication. This matches in total running time the best known sequential algorithm, due to Ne{\v{s}}et{\v{r}}il and Poljak [{\em Comment.~Math.~Univ.~Carolin.}~26 (1985) 415--419], and considerably improves its space usage and parallelizability. Further results include novel algorithms for counting triangles in sparse graphs, computing the chromatic polynomial of a graph, and computing the Tutte polynomial of a graph.Comment: 42 p

    Polynomial Size Analysis of First-Order Shapely Functions

    Get PDF
    We present a size-aware type system for first-order shapely function definitions. Here, a function definition is called shapely when the size of the result is determined exactly by a polynomial in the sizes of the arguments. Examples of shapely function definitions may be implementations of matrix multiplication and the Cartesian product of two lists. The type system is proved to be sound w.r.t. the operational semantics of the language. The type checking problem is shown to be undecidable in general. We define a natural syntactic restriction such that the type checking becomes decidable, even though size polynomials are not necessarily linear or monotonic. Furthermore, we have shown that the type-inference problem is at least semi-decidable (under this restriction). We have implemented a procedure that combines run-time testing and type-checking to automatically obtain size dependencies. It terminates on total typable function definitions.Comment: 35 pages, 1 figur

    Efficient Explicit Time Stepping of High Order Discontinuous Galerkin Schemes for Waves

    Full text link
    This work presents algorithms for the efficient implementation of discontinuous Galerkin methods with explicit time stepping for acoustic wave propagation on unstructured meshes of quadrilaterals or hexahedra. A crucial step towards efficiency is to evaluate operators in a matrix-free way with sum-factorization kernels. The method allows for general curved geometries and variable coefficients. Temporal discretization is carried out by low-storage explicit Runge-Kutta schemes and the arbitrary derivative (ADER) method. For ADER, we propose a flexible basis change approach that combines cheap face integrals with cell evaluation using collocated nodes and quadrature points. Additionally, a degree reduction for the optimized cell evaluation is presented to decrease the computational cost when evaluating higher order spatial derivatives as required in ADER time stepping. We analyze and compare the performance of state-of-the-art Runge-Kutta schemes and ADER time stepping with the proposed optimizations. ADER involves fewer operations and additionally reaches higher throughput by higher arithmetic intensities and hence decreases the required computational time significantly. Comparison of Runge-Kutta and ADER at their respective CFL stability limit renders ADER especially beneficial for higher orders when the Butcher barrier implies an overproportional amount of stages. Moreover, vector updates in explicit Runge--Kutta schemes are shown to take a substantial amount of the computational time due to their memory intensity
    corecore