Search CORE

3,019 research outputs found

A Multi-level Blocking Distinct Degree Factorization Algorithm

Author: Brent Richard
Zimmermann Paul
Publication venue
Publication date: 01/01/2007
Field of study

We give a new algorithm for performing the distinct-degree factorization of a polynomial P(x) over GF(2), using a multi-level blocking strategy. The coarsest level of blocking replaces GCD computations by multiplications, as suggested by Pollard (1975), von zur Gathen and Shoup (1992), and others. The novelty of our approach is that a finer level of blocking replaces multiplications by squarings, which speeds up the computation in GF(2)[x]/P(x) of certain interval polynomials when P(x) is sparse. As an application we give a fast algorithm to search for all irreducible trinomials x^r + x^s + 1 of degree r over GF(2), while producing a certificate that can be checked in less time than the full search. Naive algorithms cost O(r^2) per trinomial, thus O(r^3) to search over all trinomials of given degree r. Under a plausible assumption about the distribution of factors of trinomials, the new algorithm has complexity O(r^2 (log r)^{3/2}(log log r)^{1/2}) for the search over all trinomials of degree r. Our implementation achieves a speedup of greater than a factor of 560 over the naive algorithm in the case r = 24036583 (a Mersenne exponent). Using our program, we have found two new primitive trinomials of degree 24036583 over GF(2) (the previous record degree was 6972593)

arXiv.org e-Print Archive

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Hybrid static/dynamic scheduling for already optimized dense matrix factorization

Author: Donfack Simplice
Grigori Laura
Gropp William D.
Kale Vivek
Publication venue
Publication date: 08/10/2011
Field of study

We present the use of a hybrid static/dynamic scheduling strategy of the task dependency graph for direct methods used in dense numerical linear algebra. This strategy provides a balance of data locality, load balance, and low dequeue overhead. We show that the usage of this scheduling in communication avoiding dense factorization leads to significant performance gains. On a 48 core AMD Opteron NUMA machine, our experiments show that we can achieve up to 64% improvement over a version of CALU that uses fully dynamic scheduling, and up to 30% improvement over the version of CALU that uses fully static scheduling. On a 16-core Intel Xeon machine, our hybrid static/dynamic scheduling approach is up to 8% faster than the version of CALU that uses a fully static scheduling or fully dynamic scheduling. Our algorithm leads to speedups over the corresponding routines for computing LU factorization in well known libraries. On the 48 core AMD NUMA machine, our best implementation is up to 110% faster than MKL, while on the 16 core Intel Xeon machine, it is up to 82% faster than MKL. Our approach also shows significant speedups compared with PLASMA on both of these systems

arXiv.org e-Print Archive

HAL-CentraleSupelec

Illinois Digital Environment for Access to Learning and Scholarship Repository

HAL-Rennes 1

Recommended from our members

Preparing sparse solvers for exascale computing.

Author: Anzt Hartwig
Boman Erik
Curfman McInnes Lois
Falgout Rob
Ghysels Pieter
Heroux Michael
Li Xiaoye
Meier Yang Ulrike
Rajamanickam Sivasankaran
Rupp Karl
Smith Barry
Tran Mills Richard
Yamazaki Ichitaro
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

eScholarship - University of California