23,193 research outputs found
Parallel sparse interpolation using small primes
To interpolate a supersparse polynomial with integer coefficients, two
alternative approaches are the Prony-based "big prime" technique, which acts
over a single large finite field, or the more recently-proposed "small primes"
technique, which reduces the unknown sparse polynomial to many low-degree dense
polynomials. While the latter technique has not yet reached the same
theoretical efficiency as Prony-based methods, it has an obvious potential for
parallelization. We present a heuristic "small primes" interpolation algorithm
and report on a low-level C implementation using FLINT and MPI.Comment: Accepted to PASCO 201
Newton's method in practice II: The iterated refinement Newton method and near-optimal complexity for finding all roots of some polynomials of very large degrees
We present a practical implementation based on Newton's method to find all
roots of several families of complex polynomials of degrees exceeding one
billion () so that the observed complexity to find all roots is between
and (measuring complexity in terms of number of
Newton iterations or computing time). All computations were performed
successfully on standard desktop computers built between 2007 and 2012.Comment: 24 pages, 19 figures. Update in v2 incorporates progress on
polynomials of even higher degrees (greater than 1 billion
O(1) Computation of Legendre polynomials and Gauss-Legendre nodes and weights for parallel computing
A self-contained set of algorithms is proposed for the fast evaluation of Legendre polynomials of arbitrary degree and argument is an element of [-1, 1]. More specifically the time required to evaluate any Legendre polynomial, regardless of argument and degree, is bounded by a constant; i.e., the complexity is O(1). The proposed algorithm also immediately yields an O(1) algorithm for computing an arbitrary Gauss-Legendre quadrature node. Such a capability is crucial for efficiently performing certain parallel computations with high order Legendre polynomials, such as computing an integral in parallel by means of Gauss-Legendre quadrature and the parallel evaluation of Legendre series. In order to achieve the O(1) complexity, novel efficient asymptotic expansions are derived and used alongside known results. A C++ implementation is available from the authors that includes the evaluation routines of the Legendre polynomials and Gauss-Legendre quadrature rules
Square-rich fixed point polynomial evaluation on FPGAs
Polynomial evaluation is important across a wide range of application domains, so significant work has been done on accelerating its computation. The conventional algorithm, referred to as Horner's rule, involves the least number of steps but can lead to increased latency due to serial computation. Parallel evaluation algorithms such as Estrin's method have shorter latency than Horner's rule, but achieve this at the expense of large hardware overhead. This paper presents an efficient polynomial evaluation algorithm, which reforms the evaluation process to include an increased number of squaring steps. By using a squarer design that is more efficient than general multiplication, this can result in polynomial evaluation with a 57.9% latency reduction over Horner's rule and 14.6% over Estrin's method, while consuming less area than Horner's rule, when implemented on a Xilinx Virtex 6 FPGA. When applied in fixed point function evaluation, where precision requirements limit the rounding of operands, it still achieves a 52.4% performance gain compared to Horner's rule with only a 4% area overhead in evaluating 5th degree polynomials
How proofs are prepared at Camelot
We study a design framework for robust, independently verifiable, and
workload-balanced distributed algorithms working on a common input. An
algorithm based on the framework is essentially a distributed encoding
procedure for a Reed--Solomon code, which enables (a) robustness against
byzantine failures with intrinsic error-correction and identification of failed
nodes, and (b) independent randomized verification to check the entire
computation for correctness, which takes essentially no more resources than
each node individually contributes to the computation. The framework builds on
recent Merlin--Arthur proofs of batch evaluation of Williams~[{\em Electron.\
Colloq.\ Comput.\ Complexity}, Report TR16-002, January 2016] with the
observation that {\em Merlin's magic is not needed} for batch evaluation---mere
Knights can prepare the proof, in parallel, and with intrinsic
error-correction.
The contribution of this paper is to show that in many cases the verifiable
batch evaluation framework admits algorithms that match in total resource
consumption the best known sequential algorithm for solving the problem. As our
main result, we show that the -cliques in an -vertex graph can be counted
{\em and} verified in per-node time and space on
compute nodes, for any constant and
positive integer divisible by , where is the
exponent of matrix multiplication. This matches in total running time the best
known sequential algorithm, due to Ne{\v{s}}et{\v{r}}il and Poljak [{\em
Comment.~Math.~Univ.~Carolin.}~26 (1985) 415--419], and considerably improves
its space usage and parallelizability. Further results include novel algorithms
for counting triangles in sparse graphs, computing the chromatic polynomial of
a graph, and computing the Tutte polynomial of a graph.Comment: 42 p
Polynomial Size Analysis of First-Order Shapely Functions
We present a size-aware type system for first-order shapely function
definitions. Here, a function definition is called shapely when the size of the
result is determined exactly by a polynomial in the sizes of the arguments.
Examples of shapely function definitions may be implementations of matrix
multiplication and the Cartesian product of two lists. The type system is
proved to be sound w.r.t. the operational semantics of the language. The type
checking problem is shown to be undecidable in general. We define a natural
syntactic restriction such that the type checking becomes decidable, even
though size polynomials are not necessarily linear or monotonic. Furthermore,
we have shown that the type-inference problem is at least semi-decidable (under
this restriction). We have implemented a procedure that combines run-time
testing and type-checking to automatically obtain size dependencies. It
terminates on total typable function definitions.Comment: 35 pages, 1 figur
Efficient Explicit Time Stepping of High Order Discontinuous Galerkin Schemes for Waves
This work presents algorithms for the efficient implementation of
discontinuous Galerkin methods with explicit time stepping for acoustic wave
propagation on unstructured meshes of quadrilaterals or hexahedra. A crucial
step towards efficiency is to evaluate operators in a matrix-free way with
sum-factorization kernels. The method allows for general curved geometries and
variable coefficients. Temporal discretization is carried out by low-storage
explicit Runge-Kutta schemes and the arbitrary derivative (ADER) method. For
ADER, we propose a flexible basis change approach that combines cheap face
integrals with cell evaluation using collocated nodes and quadrature points.
Additionally, a degree reduction for the optimized cell evaluation is presented
to decrease the computational cost when evaluating higher order spatial
derivatives as required in ADER time stepping. We analyze and compare the
performance of state-of-the-art Runge-Kutta schemes and ADER time stepping with
the proposed optimizations. ADER involves fewer operations and additionally
reaches higher throughput by higher arithmetic intensities and hence decreases
the required computational time significantly. Comparison of Runge-Kutta and
ADER at their respective CFL stability limit renders ADER especially beneficial
for higher orders when the Butcher barrier implies an overproportional amount
of stages. Moreover, vector updates in explicit Runge--Kutta schemes are shown
to take a substantial amount of the computational time due to their memory
intensity
- …