453 research outputs found
Modular SIMD arithmetic in Mathemagix
Modular integer arithmetic occurs in many algorithms for computer algebra,
cryptography, and error correcting codes. Although recent microprocessors
typically offer a wide range of highly optimized arithmetic functions, modular
integer operations still require dedicated implementations. In this article, we
survey existing algorithms for modular integer arithmetic, and present detailed
vectorized counterparts. We also present several applications, such as fast
modular Fourier transforms and multiplication of integer polynomials and
matrices. The vectorized algorithms have been implemented in C++ inside the
free computer algebra and analysis system Mathemagix. The performance of our
implementation is illustrated by various benchmarks
Implementation and analysis of the generalised new Mersenne number transforms for encryption
PhD ThesisEncryption is very much a vast subject covering myriad techniques to conceal
and safeguard data and communications. Of the techniques that are available,
methodologies that incorporate the number theoretic transforms (NTTs) have gained
recognition, specifically the new Mersenne number transform (NMNT). Recently, two
new transforms have been introduced that extend the NMNT to a new generalised
suite of transforms referred to as the generalised NMNT (GNMNT). These two
new transforms are termed the odd NMNT (ONMNT) and the odd-squared NMNT
(O2NMNT).
Being based on the Mersenne numbers, the GNMNTs are extremely versatile with
respect to vector lengths. The GNMNTs are also capable of being implemented
using fast algorithms, employing multiple and combinational radices over one or
more dimensions. Algorithms for both the decimation-in-time (DIT) and -frequency
(DIF) methodologies using radix-2, radix-4 and split-radix are presented, including
their respective complexity and performance analyses.
Whilst the original NMNT has seen a significant amount of research applied to it
with respect to encryption, the ONMNT and O2NMNT can utilise similar techniques
that are proven to show stronger characteristics when measured using established
methodologies defining diffusion. Analyses in diffusion using a small but reasonably
sized vector-space with the GNMNTs will be exhaustively assessed and a comparison
with the Rijndael cipher, the current advanced encryption standard (AES) algorithm,
will be presented that will confirm strong diffusion characteristics.
Implementation techniques using general-purpose computing on graphics processing
units (GPGPU) have been applied, which are further assessed and discussed. Focus
is drawn upon the future of cryptography and in particular cryptology, as a
consequence of the emergence and rapid progress of GPGPU and consumer based
parallel processing
Hairy graphs and the unstable homology of Mod(g,s), Out(F_n) and Aut(F_n)
We study a family of Lie algebras {hO} which are defined for cyclic operads
O. Using his graph homology theory, Kontsevich identified the homology of two
of these Lie algebras (corresponding to the Lie and associative operads) with
the cohomology of outer automorphism groups of free groups and mapping class
groups of punctured surfaces, respectively. In this paper we introduce a hairy
graph homology theory for O. We show that the homology of hO embeds in hairy
graph homology via a trace map which generalizes the trace map defined by S.
Morita. For the Lie operad we use the trace map to find large new summands of
the abelianization of hO which are related to classical modular forms for
SL(2,Z). Using cusp forms we construct new cycles for the unstable homology of
Out(F_n), and using Eisenstein series we find new cycles for Aut(F_n). For the
associative operad we compute the first homology of the hairy graph complex by
adapting an argument of Morita, Sakasai and Suzuki, who determined the complete
abelianization of hO in the associative case.Comment: Some typos fixed. In an earlier version, we had made a conjecture
about the image of the trace map, which we have proven and will be included
in a future paper. Some comments in this paper have been changed to reflect
this. To appear in J. To
Recommended from our members
A Study of High Performance Multiple Precision Arithmetic on Graphics Processing Units
Multiple precision (MP) arithmetic is a core building block of a wide variety of algorithms in computational mathematics and computer science. In mathematics MP is used in computational number theory, geometric computation, experimental mathematics, and in some random matrix problems. In computer science, MP arithmetic is primarily used in cryptographic algorithms: securing communications, digital signatures, and code breaking. In most of these application areas, the factor that limits performance is the MP arithmetic. The focus of our research is to build and analyze highly optimized libraries that allow the MP operations to be offloaded from the CPU to the GPU. Our goal is to achieve an order of magnitude improvement over the CPU in three key metrics: operations per second per socket, operations per watt, and operation per second per dollar. What we find is that the SIMD design and balance of compute, cache, and bandwidth resources on the GPU is quite different from the CPU, so libraries such as GMP cannot simply be ported to the GPU. New approaches and algorithms are required to achieve high performance and high utilization of GPU resources. Further, we find that low-level ISA differences between GPU generations means that an approach that works well on one generation might not run well on the next.
Here we report on our progress towards MP arithmetic libraries on the GPU in four areas: (1) large integer addition, subtraction, and multiplication; (2) high performance modular multiplication and modular exponentiation (the key operations for cryptographic algorithms) across generations of GPUs; (3) high precision floating point addition, subtraction, multiplication, division, and square root; (4) parallel short division, which we prove is asymptotically optimal on EREW and CREW PRAMs
Highly Automated Formal Verification of Arithmetic Circuits
This dissertation investigates the problems of two distinctive formal verification techniques for verifying large scale multiplier circuits and proposes two approaches to overcome some of these problems. The first technique is equivalence checking based on recurrence relations, while the second one is the symbolic computation technique which is based on the theory of Grรถbner bases. This investigation demonstrates that approaches based on symbolic computation have better scalability and more robustness than state-of-the-art equivalence checking techniques for verification of arithmetic circuits. According to this conclusion, the thesis leverages the symbolic computation technique to verify floating-point designs. It proposes a new algebraic equivalence checking, in contrast to classical combinational equivalence checking, the proposed technique is capable of checking the equivalence of two circuits which have different architectures of arithmetic units as well as control logic parts, e.g., floating-point multipliers
Hyperbolic semi-adequate links
We provide a diagrammatic criterion for semi-adequate links to be hyperbolic.
We also give a conjectural description of the satellite structures of
semi-adequate links. One application of our result is that the closures of
sufficiently complicated positive braids are hyperbolic links.Comment: 25 pages, 9 figure
๊ทผ์ฌ ์ฐ์ฐ์ ๋ํ ๊ณ์ฐ ๊ฒ์ฆ ์ฐ๊ตฌ
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ)--์์ธ๋ํ๊ต ๋ํ์ :์์ฐ๊ณผํ๋ํ ์๋ฆฌ๊ณผํ๋ถ,2020. 2. ์ฒ์ ํฌ.Verifiable Computing (VC) is a complexity-theoretic method to secure the integrity of computations. The need is increasing as more computations are outsourced to untrusted parties, e.g., cloud platforms. Existing techniques, however, have mainly focused on exact computations, but not approximate arithmetic, e.g., floating-point or fixed-point arithmetic. This makes it hard to apply them to certain types of computations (e.g., machine learning, data analysis, and scientific computation) that inherently require approximate arithmetic.
In this thesis, we present an efficient interactive proof system for arithmetic circuits with rounding gates that can represent approximate arithmetic. The main idea is to represent the rounding gate into a small sub-circuit, and reuse the machinery of the Goldwasser, Kalai, and Rothblum's protocol (also known as the GKR protocol) and its recent refinements. Specifically, we shift the algebraic structure from a field to a ring to better deal with the notion of ``digits'', and generalize the original GKR protocol over a ring. Then, we represent the rounding operation by a low-degree polynomial over a ring, and develop a novel, optimal circuit construction of an arbitrary polynomial to transform the rounding polynomial to an optimal circuit representation. Moreover, we further optimize the proof generation cost for rounding by employing a Galois ring. We provide experimental results that show the efficiency of our system for approximate arithmetic. For example, our implementation performed two orders of magnitude better than the existing system for a nested 128 x 128 matrix multiplication of depth 12 on the 16-bit fixed-point arithmetic.๊ณ์ฐ๊ฒ์ฆ ๊ธฐ์ ์ ๊ณ์ฐ์ ๋ฌด๊ฒฐ์ฑ์ ํ๋ณดํ๊ธฐ ์ํ ๊ณ์ฐ ๋ณต์ก๋ ์ด๋ก ์ ๋ฐฉ๋ฒ์ด๋ค. ์ต๊ทผ ๋ง์ ๊ณ์ฐ์ด ํด๋ผ์ฐ๋ ํ๋ซํผ๊ณผ ๊ฐ์ ์ 3์์๊ฒ ์ธ์ฃผ๋จ์ ๋ฐ๋ผ ๊ทธ ํ์์ฑ์ด ์ฆ๊ฐํ๊ณ ์๋ค. ๊ทธ๋ฌ๋ ๊ธฐ์กด์ ๊ณ์ฐ๊ฒ์ฆ ๊ธฐ์ ์ ๋น๊ทผ์ฌ ์ฐ์ฐ๋ง์ ๊ณ ๋ คํ์ ๋ฟ, ๊ทผ์ฌ ์ฐ์ฐ (๋ถ๋ ์์์ ๋๋ ๊ณ ์ ์์์ ์ฐ์ฐ)์ ๊ณ ๋ คํ์ง ์์๋ค. ๋ฐ๋ผ์ ๋ณธ์ง์ ์ผ๋ก ๊ทผ์ฌ ์ฐ์ฐ์ด ํ์ํ ํน์ ์ ํ์ ๊ณ์ฐ (๊ธฐ๊ณ ํ์ต, ๋ฐ์ดํฐ ๋ถ์ ๋ฐ ๊ณผํ ๊ณ์ฐ ๋ฑ)์ ์ ์ฉํ๊ธฐ ์ด๋ ต๋ค๋ ๋ฌธ์ ๊ฐ ์์๋ค.
์ด ๋
ผ๋ฌธ์ ๋ฐ์ฌ๋ฆผ ๊ฒ์ดํธ๋ฅผ ์๋ฐํ๋ ์ฐ์ ํ๋ก๋ฅผ ์ํ ํจ์จ์ ์ธ ๋ํํ ์ฆ๋ช
์์คํ
์ ์ ์ํ๋ค. ์ด๋ฌํ ์ฐ์ ํ๋ก๋ ๊ทผ์ฌ ์ฐ์ฐ์ ํจ์จ์ ์ผ๋ก ํํํ ์ ์์ผ๋ฏ๋ก, ๊ทผ์ฌ ์ฐ์ฐ์ ๋ํ ํจ์จ์ ์ธ ๊ณ์ฐ ๊ฒ์ฆ์ด ๊ฐ๋ฅํ๋ค. ์ฃผ์ ์์ด๋์ด๋ ๋ฐ์ฌ๋ฆผ ๊ฒ์ดํธ๋ฅผ ์์ ํ๋ก๋ก ๋ณํํ ํ, ์ฌ๊ธฐ์ Goldwasser, Kalai, ๋ฐ Rothblum์ ํ๋กํ ์ฝ (GKR ํ๋กํ ์ฝ)๊ณผ ์ต๊ทผ์ ๊ฐ์ ์ ์ ์ฉํ๋ ๊ฒ์ด๋ค. ๊ตฌ์ฒด์ ์ผ๋ก, ๋์์ ๊ฐ์ฒด๋ฅผ ์ ํ์ฒด๊ฐ ์๋ ``์ซ์''๋ฅผ ๋ณด๋ค ์ ์ฒ๋ฆฌํ ์ ์๋ ํ์ผ๋ก ์นํํ ํ, ํ ์์์ ์ ์ฉ ๊ฐ๋ฅํ๋๋ก ๊ธฐ์กด์ GKR ํ๋กํ ์ฝ์ ์ผ๋ฐํํ์๋ค. ์ดํ, ๋ฐ์ฌ๋ฆผ ์ฐ์ฐ์ ํ์์ ์ฐจ์๊ฐ ๋ฎ์ ๋คํญ์์ผ๋ก ํํํ๊ณ , ๋คํญ์ ์ฐ์ฐ์ ์ต์ ์ ํ๋ก ํํ์ผ๋ก ๋ํ๋ด๋ ์๋กญ๊ณ ์ต์ ํ๋ ํ๋ก ๊ตฌ์ฑ์ ๊ฐ๋ฐํ์๋ค. ๋ํ, ๊ฐ๋ฃจ์ ํ์ ์ฌ์ฉํ์ฌ ๋ฐ์ฌ๋ฆผ์ ์ํ ์ฆ๋ช
์์ฑ ๋น์ฉ์ ๋์ฑ ์ต์ ํํ์๋ค. ๋ง์ง๋ง์ผ๋ก, ์คํ์ ํตํด ์ฐ๋ฆฌ์ ๊ทผ์ฌ ์ฐ์ฐ ๊ฒ์ฆ ์์คํ
์ ํจ์จ์ฑ์ ํ์ธํ์๋ค. ์๋ฅผ ๋ค์ด, ์ฐ๋ฆฌ์ ์์คํ
์ ๊ตฌํ ์, 16 ๋นํธ ๊ณ ์ ์์์ ์ฐ์ฐ์ ํตํ ๊น์ด 12์ ๋ฐ๋ณต๋ 128 x 128 ํ๋ ฌ ๊ณฑ์
์ ๊ฒ์ฆ์ ์์ด ๊ธฐ์กด ์์คํ
๋ณด๋ค ์ฝ 100๋ฐฐ ๋ ๋์ ์ฑ๋ฅ์ ๋ณด์ธ๋ค.1 Introduction 1
1.1 Verifiable Computing 2
1.2 Verifiable Approximate Arithmetic 3
1.2.1 Problem: Verification of Rounding Arithmetic 3
1.2.2 Motivation: Verifiable Machine Learning (AI) 4
1.3 List of Papers 5
2 Preliminaries 6
2.1 Interactive Proof and Argument 6
2.2 Sum-Check Protocol 7
2.3 The GKR Protocol 10
2.4 Notation and Cost Model 14
3 Related Work 15
3.1 Interactive Proofs 15
3.2 (Non-)Interactive Arguments 17
4 Interactive Proof for Rounding Arithmetic 20
4.1 Overview of Our Approach and Result 20
4.2 Interactive Proof over a Ring 26
4.2.1 Sum-Check Protocol over a Ring 27
4.2.2 The GKR Protocol over a Ring 29
4.3 Verifiable Rounding Operation 31
4.3.1 Lowest-Digit-Removal Polynomial over Z_{p^e} 32
4.3.2 Verification of Division-by-p Layer 33
4.4 Delegation of Polynomial Evaluation in Optimal Cost 34
4.4.1 Overview of Our Circuit Construction 35
4.4.2 Our Circuit for Polynomial Evaluation 37
4.4.3 Cost Analysis 40
4.5 Cost Optimization 45
4.5.1 Galois Ring over Z_{p^e} and a Sampling Set 45
4.5.2 Optimization of Prover's Cost for Rounding Layers 47
5 Experimental Results 50
5.1 Experimental Setup 50
5.2 Verifiable Rounding Operation 51
5.2.1 Effectiveness of Optimization via Galois Ring 51
5.2.2 Efficiency of Verifiable Rounding Operation 53
5.3 Comparison to Thaler's Refinement of GKR Protocol 54
5.4 Discussion 57
6 Conclusions 60
6.1 Towards Verifiable AI 61
6.2 Verifiable Cryptographic Computation 62
Abstract (in Korean) 74Docto
High performance SIMD modular arithmetic for polynomial evaluation
Two essential problems in Computer Algebra, namely polynomial factorization
and polynomial greatest common divisor computation, can be efficiently solved
thanks to multiple polynomial evaluations in two variables using modular
arithmetic. In this article, we focus on the efficient computation of such
polynomial evaluations on one single CPU core. We first show how to leverage
SIMD computing for modular arithmetic on AVX2 and AVX-512 units, using both
intrinsics and OpenMP compiler directives. Then we manage to increase the
operational intensity and to exploit instruction-level parallelism in order to
increase the compute efficiency of these polynomial evaluations. All this
results in the end to performance gains up to about 5x on AVX2 and 10x on
AVX-512
Higher spin polynomial solutions of quantum Knizhnik--Zamolodchikov equation
We provide explicit formulae for highest-weight to highest-weight correlation
functions of perfect vertex operators of at
arbitrary integer level . They are given in terms of certain Macdonald
polynomials. We apply this construction to the computation of the ground state
of higher spin vertex models, spin chains (spin XXZ) or loop models in
the root of unity case
- โฆ