Search CORE

10,571 research outputs found

Recursive double-size fixed precision arithmetic

Author: J-G Dumas
J-G Dumas
J-G Dumas
O Arazi
P Coussy
PL Montgomery
R Brent
R Freivalds
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/07/2016
Field of study

International audienceThis work is a part of the SHIVA (Secured Hardware Immune Versatile Architecture) project whose purpose is to provide a programmable and reconfigurable hardware module with high level of security. We propose a recursive double-size fixed precision arithmetic called RecInt. Our work can be split in two parts. First we developped a C++ software library with performances comparable to GMP ones. Secondly our simple representation of the integers allows an implementation on FPGA. Our idea is to consider sizes that are a power of 2 and to apply doubling techniques to implement them efficiently: we design a recursive data structure where integers of size 2^k, for k>k0 can be stored as two integers of size 2^{k-1}. Obviously for k<=k0 we use machine arithmetic instead (k0 depending on the architecture)

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Rectangular Full Packed Format for Cholesky's Algorithm: Factorization, Solution and Inversion

Author: Andersen B. S.
Fred G. Gustavson
Gustavson F. G.
Gustavson F. G.
Jack J. Dongarra
Jerzy Waśniewski
Julien Langou
Publication venue
Publication date: 01/01/2009
Field of study

We describe a new data format for storing triangular, symmetric, and Hermitian matrices called RFPF (Rectangular Full Packed Format). The standard two dimensional arrays of Fortran and C (also known as full format) that are used to represent triangular and symmetric matrices waste nearly half of the storage space but provide high performance via the use of Level 3 BLAS. Standard packed format arrays fully utilize storage (array space) but provide low performance as there is no Level 3 packed BLAS. We combine the good features of packed and full storage using RFPF to obtain high performance via using Level 3 BLAS as RFPF is a standard full format representation. Also, RFPF requires exactly the same minimal storage as packed format. Each LAPACK full and/or packed triangular, symmetric, and Hermitian routine becomes a single new RFPF routine based on eight possible data layouts of RFPF. This new RFPF routine usually consists of two calls to the corresponding LAPACK full format routine and two calls to Level 3 BLAS routines. This means {\it no} new software is required. As examples, we present LAPACK routines for Cholesky factorization, Cholesky solution and Cholesky inverse computation in RFPF to illustrate this new work and to describe its performance on several commonly used computer platforms. Performance of LAPACK full routines using RFPF versus LAPACK full routines using standard format for both serial and SMP parallel processing is about the same while using half the storage. Performance gains are roughly one to a factor of 43 for serial and one to a factor of 97 for SMP parallel times faster using vendor LAPACK full routines with RFPF than with using vendor and/or reference packed routines

arXiv.org e-Print Archive

CiteSeerX

Crossref

The University of Manchester - Institutional Repository

Online Research Database In Technology

A study of systems implementation languages for the POCCNET system

Author: Basili V. R.
Franklin J. W.
Publication venue
Publication date: 27/08/1976
Field of study

The results are presented of a study of systems implementation languages for the Payload Operations Control Center Network (POCCNET). Criteria are developed for evaluating the languages, and fifteen existing languages are evaluated on the basis of these criteria

NASA Technical Reports Server

Digital Repository at the University of Maryland

Fast recursive filters for simulating nonlinear dynamic systems

Author: van Hateren J. H.
Publication venue
Publication date: 30/08/2007
Field of study

A fast and accurate computational scheme for simulating nonlinear dynamic systems is presented. The scheme assumes that the system can be represented by a combination of components of only two different types: first-order low-pass filters and static nonlinearities. The parameters of these filters and nonlinearities may depend on system variables, and the topology of the system may be complex, including feedback. Several examples taken from neuroscience are given: phototransduction, photopigment bleaching, and spike generation according to the Hodgkin-Huxley equations. The scheme uses two slightly different forms of autoregressive filters, with an implicit delay of zero for feedforward control and an implicit delay of half a sample distance for feedback control. On a fairly complex model of the macaque retinal horizontal cell it computes, for a given level of accuracy, 1-2 orders of magnitude faster than 4th-order Runge-Kutta. The computational scheme has minimal memory requirements, and is also suited for computation on a stream processor, such as a GPU (Graphical Processing Unit).Comment: 20 pages, 8 figures, 1 table. A comparison with 4th-order Runge-Kutta integration shows that the new algorithm is 1-2 orders of magnitude faster. The paper is in press now at Neural Computatio

arXiv.org e-Print Archive

CiteSeerX

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

University of Groningen Digital Archive

Dissertations of the University of Groningen

NumGfun: a Package for Numerical and Analytic Computation with D-finite Functions

Author: Mezzarobba Marc
Publication venue
Publication date: 01/01/2010
Field of study

This article describes the implementation in the software package NumGfun of classical algorithms that operate on solutions of linear differential equations or recurrence relations with polynomial coefficients, including what seems to be the first general implementation of the fast high-precision numerical evaluation algorithms of Chudnovsky & Chudnovsky. In some cases, our descriptions contain improvements over existing algorithms. We also provide references to relevant ideas not currently used in NumGfun

arXiv.org e-Print Archive

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Efficient implementation of the Hardy-Ramanujan-Rademacher formula

Author: Apostol
Borwein
Borwein
Brent
Cipolla
Crandall
Erdős
Knuth
Knuth
Odlyzko
Tonelli
Publication venue: 'Wiley'
Publication date: 01/01/2012
Field of study

We describe how the Hardy-Ramanujan-Rademacher formula can be implemented to allow the partition function

p(n)

to be computed with softly optimal complexity

O(n^{1/2+o(1)})

and very little overhead. A new implementation based on these techniques achieves speedups in excess of a factor 500 over previously published software and has been used by the author to calculate

p(10^{19})

, an exponent twice as large as in previously reported computations. We also investigate performance for multi-evaluation of

p(n)

, where our implementation of the Hardy-Ramanujan-Rademacher formula becomes superior to power series methods on far denser sets of indices than previous implementations. As an application, we determine over 22 billion new congruences for the partition function, extending Weaver's tabulation of 76,065 congruences.Comment: updated version containing an unconditional complexity proof; accepted for publication in LMS Journal of Computation and Mathematic

arXiv.org e-Print Archive

CiteSeerX

Crossref

Affine functions and series with co-inductive real numbers

Author: Bertot Yves
Publication venue
Publication date: 01/01/2006
Field of study

We extend the work of A. Ciaffaglione and P. Di Gianantonio on mechanical verification of algorithms for exact computation on real numbers, using infinite streams of digits implemented as co-inductive types. Four aspects are studied: the first aspect concerns the proof that digit streams can be related to the axiomatized real numbers that are already axiomatized in the proof system (axiomatized, but with no fixed representation). The second aspect re-visits the definition of an addition function, looking at techniques to let the proof search mechanism perform the effective construction of an algorithm that is correct by construction. The third aspect concerns the definition of a function to compute affine formulas with positive rational coefficients. This should be understood as a testbed to describe a technique to combine co-recursion and recursion to obtain a model for an algorithm that appears at first sight to be outside the expressive power allowed by the proof system. The fourth aspect concerns the definition of a function to compute series, with an application on the series that is used to compute Euler's number e. All these experiments should be reproducible in any proof system that supports co-inductive types, co-recursion and general forms of terminating recursion, but we performed with the Coq system [12, 3, 14]

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Efficient implementation of symplectic implicit Runge-Kutta schemes with simplified Newton iterations

Author: Antoñana Mikel
Makazaga Joseba
Murua Ander
Publication venue
Publication date: 01/01/2017
Field of study

We are concerned with the efficient implementation of symplectic implicit Runge-Kutta (IRK) methods applied to systems of (non-necessarily Hamiltonian) ordinary differential equations by means of Newton-like iterations. We pay particular attention to symmetric symplectic IRK schemes (such as collocation methods with Gaussian nodes). For a

s

-stage IRK scheme used to integrate a

d

-dimensional system of ordinary differential equations, the application of simplified versions of Newton iterations requires solving at each step several linear systems (one per iteration) with the same

sd \times sd

real coefficient matrix. We propose rewriting such

sd

-dimensional linear systems as an equivalent

(s+1)d

-dimensional systems that can be solved by performing the LU decompositions of

[s/2] +1

real matrices of size

d \times d

. We present a C implementation (based on Newton-like iterations) of Runge-Kutta collocation methods with Gaussian nodes that make use of such a rewriting of the linear system and that takes special care in reducing the effect of round-off errors. We report some numerical experiments that demonstrate the reduced round-off error propagation of our implementation

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital para la Docencia y la Investigación