Search CORE

453 research outputs found

Modular SIMD arithmetic in Mathemagix

Author: Lecerf Grégoire
Quintin Guillaume
van der Hoeven Joris
Publication venue
Publication date: 29/06/2014
Field of study

Modular integer arithmetic occurs in many algorithms for computer algebra, cryptography, and error correcting codes. Although recent microprocessors typically offer a wide range of highly optimized arithmetic functions, modular integer operations still require dedicated implementations. In this article, we survey existing algorithms for modular integer arithmetic, and present detailed vectorized counterparts. We also present several applications, such as fast modular Fourier transforms and multiplication of integer polynomials and matrices. The vectorized algorithms have been implemented in C++ inside the free computer algebra and analysis system Mathemagix. The performance of our implementation is illustrated by various benchmarks

arXiv.org e-Print Archive

HAL-UNILIM

HAL-Polytechnique

Implementation and analysis of the generalised new Mersenne number transforms for encryption

Author: Rutter Nick
Publication venue: Newcastle University
Publication date: 01/01/2015
Field of study

PhD ThesisEncryption is very much a vast subject covering myriad techniques to conceal and safeguard data and communications. Of the techniques that are available, methodologies that incorporate the number theoretic transforms (NTTs) have gained recognition, specifically the new Mersenne number transform (NMNT). Recently, two new transforms have been introduced that extend the NMNT to a new generalised suite of transforms referred to as the generalised NMNT (GNMNT). These two new transforms are termed the odd NMNT (ONMNT) and the odd-squared NMNT (O2NMNT). Being based on the Mersenne numbers, the GNMNTs are extremely versatile with respect to vector lengths. The GNMNTs are also capable of being implemented using fast algorithms, employing multiple and combinational radices over one or more dimensions. Algorithms for both the decimation-in-time (DIT) and -frequency (DIF) methodologies using radix-2, radix-4 and split-radix are presented, including their respective complexity and performance analyses. Whilst the original NMNT has seen a significant amount of research applied to it with respect to encryption, the ONMNT and O2NMNT can utilise similar techniques that are proven to show stronger characteristics when measured using established methodologies defining diffusion. Analyses in diffusion using a small but reasonably sized vector-space with the GNMNTs will be exhaustively assessed and a comparison with the Rijndael cipher, the current advanced encryption standard (AES) algorithm, will be presented that will confirm strong diffusion characteristics. Implementation techniques using general-purpose computing on graphics processing units (GPGPU) have been applied, which are further assessed and discussed. Focus is drawn upon the future of cryptography and in particular cryptology, as a consequence of the emergence and rapid progress of GPGPU and consumer based parallel processing

Newcastle University eTheses

Hairy graphs and the unstable homology of Mod(g,s), Out(F_n) and Aut(F_n)

Author: Conant James
Kassabov Martin
Vogtmann Karen
Publication venue: 'Wiley'
Publication date: 13/09/2012
Field of study

We study a family of Lie algebras {hO} which are defined for cyclic operads O. Using his graph homology theory, Kontsevich identified the homology of two of these Lie algebras (corresponding to the Lie and associative operads) with the cohomology of outer automorphism groups of free groups and mapping class groups of punctured surfaces, respectively. In this paper we introduce a hairy graph homology theory for O. We show that the homology of hO embeds in hairy graph homology via a trace map which generalizes the trace map defined by S. Morita. For the Lie operad we use the trace map to find large new summands of the abelianization of hO which are related to classical modular forms for SL(2,Z). Using cusp forms we construct new cycles for the unstable homology of Out(F_n), and using Eisenstein series we find new cycles for Aut(F_n). For the associative operad we compute the first homology of the hairy graph complex by adapting an argument of Morita, Sakasai and Suzuki, who determined the complete abelianization of hO in the associative case.Comment: Some typos fixed. In an earlier version, we had made a conjecture about the image of the trace map, which we have proven and will be included in a future paper. Some comments in this paper have been changed to reflect this. To appear in J. To

arXiv.org e-Print Archive

Crossref

Recommended from our members

A Study of High Performance Multiple Precision Arithmetic on Graphics Processing Units

Author: Emmart Niall
Publication venue: ScholarWorks@UMass Amherst
Publication date: 21/03/2018
Field of study

Multiple precision (MP) arithmetic is a core building block of a wide variety of algorithms in computational mathematics and computer science. In mathematics MP is used in computational number theory, geometric computation, experimental mathematics, and in some random matrix problems. In computer science, MP arithmetic is primarily used in cryptographic algorithms: securing communications, digital signatures, and code breaking. In most of these application areas, the factor that limits performance is the MP arithmetic. The focus of our research is to build and analyze highly optimized libraries that allow the MP operations to be offloaded from the CPU to the GPU. Our goal is to achieve an order of magnitude improvement over the CPU in three key metrics: operations per second per socket, operations per watt, and operation per second per dollar. What we find is that the SIMD design and balance of compute, cache, and bandwidth resources on the GPU is quite different from the CPU, so libraries such as GMP cannot simply be ported to the GPU. New approaches and algorithms are required to achieve high performance and high utilization of GPU resources. Further, we find that low-level ISA differences between GPU generations means that an approach that works well on one generation might not run well on the next. Here we report on our progress towards MP arithmetic libraries on the GPU in four areas: (1) large integer addition, subtraction, and multiplication; (2) high performance modular multiplication and modular exponentiation (the key operations for cryptographic algorithms) across generations of GPUs; (3) high precision floating point addition, subtraction, multiplication, division, and square root; (4) parallel short division, which we prove is asymptotically optimal on EREW and CREW PRAMs

ScholarWorks@UMass Amherst

Highly Automated Formal Verification of Arithmetic Circuits

Author: Sayed-Ahmed Amr
Publication venue
Publication date: 01/01/2017
Field of study

This dissertation investigates the problems of two distinctive formal verification techniques for verifying large scale multiplier circuits and proposes two approaches to overcome some of these problems. The first technique is equivalence checking based on recurrence relations, while the second one is the symbolic computation technique which is based on the theory of Gröbner bases. This investigation demonstrates that approaches based on symbolic computation have better scalability and more robustness than state-of-the-art equivalence checking techniques for verification of arithmetic circuits. According to this conclusion, the thesis leverages the symbolic computation technique to verify floating-point designs. It proposes a new algebraic equivalence checking, in contrast to classical combinational equivalence checking, the proposed technique is capable of checking the equivalence of two circuits which have different architectures of arithmetic units as well as control logic parts, e.g., floating-point multipliers

E-LIB Dokumentserver - Staats und Universitätsbibliothek Bremen

Hyperbolic semi-adequate links

Author: David Futer
Efstratia Kalfagianni
Jessica
S. Purcell
Publication venue: 'International Press of Boston'
Publication date: 14/10/2014
Field of study

We provide a diagrammatic criterion for semi-adequate links to be hyperbolic. We also give a conjectural description of the satellite structures of semi-adequate links. One application of our result is that the closures of sufficiently complicated positive braids are hyperbolic links.Comment: 25 pages, 9 figure

arXiv.org e-Print Archive

CiteSeerX

근사 연산에 대한 계산 검증 연구

Author: 김동우
Publication venue: 서울대학교 대학원
Publication date: 01/02/2020
Field of study

학위논문(박사)--서울대학교 대학원 :자연과학대학 수리과학부,2020. 2. 천정희.Verifiable Computing (VC) is a complexity-theoretic method to secure the integrity of computations. The need is increasing as more computations are outsourced to untrusted parties, e.g., cloud platforms. Existing techniques, however, have mainly focused on exact computations, but not approximate arithmetic, e.g., floating-point or fixed-point arithmetic. This makes it hard to apply them to certain types of computations (e.g., machine learning, data analysis, and scientific computation) that inherently require approximate arithmetic. In this thesis, we present an efficient interactive proof system for arithmetic circuits with rounding gates that can represent approximate arithmetic. The main idea is to represent the rounding gate into a small sub-circuit, and reuse the machinery of the Goldwasser, Kalai, and Rothblum's protocol (also known as the GKR protocol) and its recent refinements. Specifically, we shift the algebraic structure from a field to a ring to better deal with the notion of ``digits'', and generalize the original GKR protocol over a ring. Then, we represent the rounding operation by a low-degree polynomial over a ring, and develop a novel, optimal circuit construction of an arbitrary polynomial to transform the rounding polynomial to an optimal circuit representation. Moreover, we further optimize the proof generation cost for rounding by employing a Galois ring. We provide experimental results that show the efficiency of our system for approximate arithmetic. For example, our implementation performed two orders of magnitude better than the existing system for a nested 128 x 128 matrix multiplication of depth 12 on the 16-bit fixed-point arithmetic.계산검증 기술은 계산의 무결성을 확보하기 위한 계산 복잡도 이론적 방법이다. 최근 많은 계산이 클라우드 플랫폼과 같은 제3자에게 외주됨에 따라 그 필요성이 증가하고 있다. 그러나 기존의 계산검증 기술은 비근사 연산만을 고려했을 뿐, 근사 연산 (부동 소수점 또는 고정 소수점 연산)은 고려하지 않았다. 따라서 본질적으로 근사 연산이 필요한 특정 유형의 계산 (기계 학습, 데이터 분석 및 과학 계산 등)에 적용하기 어렵다는 문제가 있었다. 이 논문은 반올림 게이트를 수반하는 산술 회로를 위한 효율적인 대화형 증명 시스템을 제시한다. 이러한 산술 회로는 근사 연산을 효율적으로 표현할 수 있으므로, 근사 연산에 대한 효율적인 계산 검증이 가능하다. 주요 아이디어는 반올림 게이트를 작은 회로로 변환한 후, 여기에 Goldwasser, Kalai, 및 Rothblum의 프로토콜 (GKR 프로토콜)과 최근의 개선을 적용하는 것이다. 구체적으로, 대수적 객체를 유한체가 아닌 ``숫자''를 보다 잘 처리할 수 있는 환으로 치환한 후, 환 위에서 적용 가능하도록 기존의 GKR 프로토콜을 일반화하였다. 이후, 반올림 연산을 환에서 차수가 낮은 다항식으로 표현하고, 다항식 연산을 최적의 회로 표현으로 나타내는 새롭고 최적화된 회로 구성을 개발하였다. 또한, 갈루아 환을 사용하여 반올림을 위한 증명 생성 비용을 더욱 최적화하였다. 마지막으로, 실험을 통해 우리의 근사 연산 검증 시스템의 효율성을 확인하였다. 예를 들어, 우리의 시스템은 구현 시, 16 비트 고정 소수점 연산을 통한 깊이 12의 반복된 128 x 128 행렬 곱셈의 검증에 있어 기존 시스템보다 약 100배 더 나은 성능을 보인다.1 Introduction 1 1.1 Verifiable Computing 2 1.2 Verifiable Approximate Arithmetic 3 1.2.1 Problem: Verification of Rounding Arithmetic 3 1.2.2 Motivation: Verifiable Machine Learning (AI) 4 1.3 List of Papers 5 2 Preliminaries 6 2.1 Interactive Proof and Argument 6 2.2 Sum-Check Protocol 7 2.3 The GKR Protocol 10 2.4 Notation and Cost Model 14 3 Related Work 15 3.1 Interactive Proofs 15 3.2 (Non-)Interactive Arguments 17 4 Interactive Proof for Rounding Arithmetic 20 4.1 Overview of Our Approach and Result 20 4.2 Interactive Proof over a Ring 26 4.2.1 Sum-Check Protocol over a Ring 27 4.2.2 The GKR Protocol over a Ring 29 4.3 Verifiable Rounding Operation 31 4.3.1 Lowest-Digit-Removal Polynomial over Z_{p^e} 32 4.3.2 Verification of Division-by-p Layer 33 4.4 Delegation of Polynomial Evaluation in Optimal Cost 34 4.4.1 Overview of Our Circuit Construction 35 4.4.2 Our Circuit for Polynomial Evaluation 37 4.4.3 Cost Analysis 40 4.5 Cost Optimization 45 4.5.1 Galois Ring over Z_{p^e} and a Sampling Set 45 4.5.2 Optimization of Prover's Cost for Rounding Layers 47 5 Experimental Results 50 5.1 Experimental Setup 50 5.2 Verifiable Rounding Operation 51 5.2.1 Effectiveness of Optimization via Galois Ring 51 5.2.2 Efficiency of Verifiable Rounding Operation 53 5.3 Comparison to Thaler's Refinement of GKR Protocol 54 5.4 Discussion 57 6 Conclusions 60 6.1 Towards Verifiable AI 61 6.2 Verifiable Cryptographic Computation 62 Abstract (in Korean) 74Docto

SNU Open Repository and Archive

High performance SIMD modular arithmetic for polynomial evaluation

Author: Fleury Ambroise
Fortin Pierre
Lemaire François
Monagan Michael
Publication venue
Publication date: 24/04/2020
Field of study

Two essential problems in Computer Algebra, namely polynomial factorization and polynomial greatest common divisor computation, can be efficiently solved thanks to multiple polynomial evaluations in two variables using modular arithmetic. In this article, we focus on the efficient computation of such polynomial evaluations on one single CPU core. We first show how to leverage SIMD computing for modular arithmetic on AVX2 and AVX-512 units, using both intrinsics and OpenMP compiler directives. Then we manage to increase the operational intensity and to exploit instruction-level parallelism in order to increase the compute efficiency of these polynomial evaluations. All this results in the end to performance gains up to about 5x on AVX2 and 10x on AVX-512

arXiv.org e-Print Archive

HAL Descartes

Hal-Diderot

Higher spin polynomial solutions of quantum Knizhnik--Zamolodchikov equation

Author: Fonseca T.
Zinn-Justin P.
Publication venue
Publication date: 17/10/2013
Field of study

We provide explicit formulae for highest-weight to highest-weight correlation functions of perfect vertex operators of

U_q(\hat{\mathfrak{sl}(2)})

at arbitrary integer level

\ell

. They are given in terms of certain Macdonald polynomials. We apply this construction to the computation of the ground state of higher spin vertex models, spin chains (spin

\ell/2

XXZ) or loop models in the root of unity case

q=-e^{-i\pi/(\ell+2)}

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

HAL Université de Savoie