4 research outputs found

    Parallel Computation of the Minimal Elements of a Poset

    Get PDF
    Computing the minimal elements of a partially ordered finite set (poset) is a fundamental problem in combinatorics with numerous applications such as polynomial expression optimization, transversal hypergraph generation and redundant component removal, to name a few. We propose a divide-and-conquer algorithm which is not only cache-oblivious but also can be parallelized free of determinacy races. We have implemented it in Cilk++ targeting multicores. For our test problems of sufficiently large input size our code demonstrates a linear speedup on 32 cores.National Science Foundation (U.S.). (Grant number CNS-0615215)National Science Foundation (U.S.). (Grant number CCF- 0621511

    On the Factor Refinement Principle and its Implementation on Multicore Architectures

    Get PDF
    The factor refinement principle turns a partial factorization of integers (or polynomi­ als) into a more complete factorization represented by basis elements and exponents, with basis elements that are pairwise coprime. There are lots of applications of this refinement technique such as simplifying systems of polynomial inequations and, more generally, speeding up certain algebraic algorithms by eliminating redundant expressions that may occur during intermediate computations. Successive GCD computations and divisions are used to accomplish this task until all the basis elements are pairwise coprime. Moreover, square-free factorization (which is the first step of many factorization algorithms) is used to remove the repeated patterns from each input element. Differentiation, division and GCD calculation op­ erations are required to complete this pre-processing step. Both factor refinement and square-free factorization often rely on plain (quadratic) algorithms for multipli­ cation but can be substantially improved with asymptotically fast multiplication on sufficiently large input. In this work, we review the working principles and complexity estimates of the factor refinement, in case of plain arithmetic, as well as asymptotically fast arithmetic. Following this review process, we design, analyze and implement parallel adaptations of these factor refinement algorithms. We consider several algorithm optimization techniques such as data locality analysis, balancing subproblems, etc. to fully exploit modern multicore architectures. The Cilk++ implementation of our parallel algorithm based on the augment refinement principle of Bach, Driscoll and Shallit achieves linear speedup for input data of sufficiently large size

    Efficient Evaluation of Large Polynomials

    Get PDF
    In scientific computing, it is often required to evaluate a polynomial expression (or a matrix depending on some variables) at many points which are not known in advance or with coordinates containing “symbolic expressions”. In these circumstances, standard evaluation schemes, such as those based on Fast Fourier Transforms do not apply. Given a polynomial f expressed as the sum of its terms, we propose an algorithm which generates a representation of f optimizing the process of evaluating f at some points. In addition, this evaluation of f can be done efficiently in terms of data locality and parallelism. We have implemented our algorithm in the Cilk++ concurrency platform and our implementation achieves nearly linear speedup on 16 cores with large enough input. For some large polynomials, the generated schedule can be evaluated at least 10 times faster than the schedules produced by other available software solutions. Moreover, our code can handle much larger input polynomials
    corecore