Search CORE

14,854 research outputs found

Dynamic MDS Matrices for Substantial Cryptographic Strength

Author: Malik Muhammad Yasir
No Jong-Seon
Publication venue
Publication date: 08/04/2011
Field of study

Ciphers get their strength from the mathematical functions of confusion and diffusion, also known as substitution and permutation. These were the basics of classical cryptography and they are still the basic part of modern ciphers. In block ciphers diffusion is achieved by the use of Maximum Distance Separable (MDS) matrices. In this paper we present some methods for constructing dynamic (and random) MDS matrices.Comment: Short paper at WISA'10, 201

arXiv.org e-Print Archive

CiteSeerX

Cryptology ePrint Archive

AlSub: Fully Parallel and Modular Subdivision

Author: Mlakar Daniel
Seidel Hans-Peter
Steinberger Markus
Winter Martin
Zayer Rhaleb
Publication venue
Publication date: 01/01/2018
Field of study

In recent years, mesh subdivision---the process of forging smooth free-form surfaces from coarse polygonal meshes---has become an indispensable production instrument. Although subdivision performance is crucial during simulation, animation and rendering, state-of-the-art approaches still rely on serial implementations for complex parts of the subdivision process. Therefore, they often fail to harness the power of modern parallel devices, like the graphics processing unit (GPU), for large parts of the algorithm and must resort to time-consuming serial preprocessing. In this paper, we show that a complete parallelization of the subdivision process for modern architectures is possible. Building on sparse matrix linear algebra, we show how to structure the complete subdivision process into a sequence of algebra operations. By restructuring and grouping these operations, we adapt the process for different use cases, such as regular subdivision of dynamic meshes, uniform subdivision for immutable topology, and feature-adaptive subdivision for efficient rendering of animated models. As the same machinery is used for all use cases, identical subdivision results are achieved in all parts of the production pipeline. As a second contribution, we show how these linear algebra formulations can effectively be translated into efficient GPU kernels. Applying our strategies to

\sqrt{3}

, Loop and Catmull-Clark subdivision shows significant speedups of our approach compared to state-of-the-art solutions, while we completely avoid serial preprocessing.Comment: Changed structure Added content Improved description

arXiv.org e-Print Archive

MPG.PuRe

Recommended from our members

Two-dimensional DCT/IDCT architecture

Author: Aggoun A
Jollah I
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/01/2003
Field of study

A fully parallel architecture for the computation of a two-dimensional (2-D) discrete cosine transform (DCT), based on row-column decomposition is presented. It uses the same one dimensional (1-D) DCT unit for the row and column computations and (N2+N) registers to perform the transposition. It possesses features of regularity and modularity, and is thus well suited for VLSI implementation. It can be used for the computation of either the forward or the inverse 2-D DCT. Each 1-D DCT unit uses N fully parallel vector inner product (VIP) units. The design of the VIP units is based on a systematic design methodology using radix-2” arithmetic, which allows partitioning of the elements of each vector into small groups. Array multipliers without the final adder are used to produce the different partial product terms. This allows a more efficient use of 4:2 compressors for the accumulation of the products in the intermediate stages and reduces the number of accumulators from N to one. Using this procedure, the 2-D DCT architecture requires less than N2 multipliers (in terms of area occupied) and only 2N adders. It can compute a N x N-point DCT at a rate of one complete transform per N cycles after an appropriate initial delay

Brunel University Research Archive

Format Abstraction for Sparse Tensor Algebra Compilers

Author: Amarasinghe Saman
Chou Stephen
Kjolstad Fredrik
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/11/2018
Field of study

This paper shows how to build a sparse tensor algebra compiler that is agnostic to tensor formats (data layouts). We develop an interface that describes formats in terms of their capabilities and properties, and show how to build a modular code generator where new formats can be added as plugins. We then describe six implementations of the interface that compose to form the dense, CSR/CSF, COO, DIA, ELL, and HASH tensor formats and countless variants thereof. With these implementations at hand, our code generator can generate code to compute any tensor algebra expression on any combination of the aforementioned formats. To demonstrate our technique, we have implemented it in the taco tensor algebra compiler. Our modular code generator design makes it simple to add support for new tensor formats, and the performance of the generated code is competitive with hand-optimized implementations. Furthermore, by extending taco to support a wider range of formats specialized for different application and data characteristics, we can improve end-user application performance. For example, if input data is provided in the COO format, our technique allows computing a single matrix-vector multiplication directly with the data in COO, which is up to 3.6

\times

faster than by first converting the data to CSR.Comment: Presented at OOPSLA 201

arXiv.org e-Print Archive

DSpace@MIT

Development of symbolic algorithms for certain algebraic processes

Author: Abd. Rahman Ali
Aris Nor'aini
Publication venue: Research Management Center
Publication date: 01/04/2007
Field of study

This study investigates the problem of computing the exact greatest common divisor of two polynomials relative to an orthogonal basis, defined over the rational number field. The main objective of the study is to design and implement an effective and efficient symbolic algorithm for the general class of dense polynomials, given the rational number defining terms of their basis. From a general algorithm using the comrade matrix approach, the nonmodular and modular techniques are prescribed. If the coefficients of the generalized polynomials are multiprecision integers, multiprecision arithmetic will be required in the construction of the comrade matrix and the corresponding systems coefficient matrix. In addition, the application of the nonmodular elimination technique on this coefficient matrix extensively applies multiprecision rational number operations. The modular technique is employed to minimize the complexity involved in such computations. A divisor test algorithm that enables the detection of an unlucky reduction is a crucial device for an effective implementation of the modular technique. With the bound of the true solution not known a priori, the test is devised and carefully incorporated into the modular algorithm. The results illustrate that the modular algorithm illustrate its best performance for the class of relatively prime polynomials. The empirical computing time results show that the modular algorithm is markedly superior to the nonmodular algorithms in the case of sufficiently dense Legendre basis polynomials with a small GCD solution. In the case of dense Legendre basis polynomials with a big GCD solution, the modular algorithm is significantly superior to the nonmodular algorithms in higher degree polynomials. For more definitive conclusions, the computing time functions of the algorithms that are presented in this report have been worked out. Further investigations have also been suggested

Universiti Teknologi Malaysia Institutional Repository

Implementing Shor's algorithm on Josephson Charge Qubits

Author: A. V. Aho
D. E. Knuth
D. R. Stinson
G. H. Golub
J. Gruska
M. A. Nielsen
M. H. Devoret
M. Hirvensalo
P. W. Shor
S. Beauregard
T. Yamamoto
W. G. van der Wiel
Y. Nakamura
Publication venue: 'American Physical Society (APS)'
Publication date: 02/03/2004
Field of study

We investigate the physical implementation of Shor's factorization algorithm on a Josephson charge qubit register. While we pursue a universal method to factor a composite integer of any size, the scheme is demonstrated for the number 21. We consider both the physical and algorithmic requirements for an optimal implementation when only a small number of qubits is available. These aspects of quantum computation are usually the topics of separate research communities; we present a unifying discussion of both of these fundamental features bridging Shor's algorithm to its physical realization using Josephson junction qubits. In order to meet the stringent requirements set by a short decoherence time, we accelerate the algorithm by decomposing the quantum circuit into tailored two- and three-qubit gates and we find their physical realizations through numerical optimization.Comment: 12 pages, submitted to Phys. Rev.

arXiv.org e-Print Archive

Crossref

Contract-Based General-Purpose GPU Programming

Author: Kolesnichenko Alexey
Meyer Bertrand
Nanz Sebastian
Poskitt Christopher M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/10/2014
Field of study

Using GPUs as general-purpose processors has revolutionized parallel computing by offering, for a large and growing set of algorithms, massive data-parallelization on desktop machines. An obstacle to widespread adoption, however, is the difficulty of programming them and the low-level control of the hardware required to achieve good performance. This paper suggests a programming library, SafeGPU, that aims at striking a balance between programmer productivity and performance, by making GPU data-parallel operations accessible from within a classical object-oriented programming language. The solution is integrated with the design-by-contract approach, which increases confidence in functional program correctness by embedding executable program specifications into the program text. We show that our library leads to modular and maintainable code that is accessible to GPGPU non-experts, while providing performance that is comparable with hand-written CUDA code. Furthermore, runtime contract checking turns out to be feasible, as the contracts can be executed on the GPU

arXiv.org e-Print Archive

CiteSeerX

Repository for Publications and Research Data

Crossref

Institutional Knowledge at Singapore Management University