10 research outputs found
Worst--Case Analysis of Weber's Algorithm
11 pagesInternational audienceRecently, Ken Weber introduced an algorithm for finding the -pairs satisfying , with , where and are coprime. It is based on Sorenson's and Jebelean's ''-ary reduction'' algorithms. We provide a formula for , the maximal number of iterations in the loop of Weber's GCD algorithm
A New Modular Division Algorithm and Applications
12 pagesInternational audienceThe present paper proposes a new parallel algorithm for the modular division , where and are positive integers . The algorithm combines the classical add-and-shift multiplication scheme with a new propagation carry technique. This ''Pen and Paper Inverse'' ({\em PPI}) algorithm, is better suited for systolic parallelization in a ''least-significant digit first'' pipelined manner. Although it is equivalent to Jebelean's modular division algorithm~\cite{jeb2} in terms of performance (time complexity, work, efficiency), the linear parallelization of the {\em PPI} algorithm improves on the latter when the input size is large. The parallelized versions of the {\em PPI} algorithm leads to various applications, such as the exact division and the digit modulus operation (dmod) of two long integers. It is also applied to the determination of the periods of rational numbers as well as their -adic expansion in any radix
A Modular Integer GCD Algorithm
This paper describes the first algorithm to compute the greatest common divisor (GCD) of two n-bit integers using a modular representation for intermediate values U, V and also for the result. It is based on a reduction step, similar to one used in the accelerated algorithm [T. Jebelean, A generalization of the binary GCD algorithm, in: ISSAC \u2793: International Symposium on Symbolic and Algebraic Computation, Kiev, Ukraine, 1993, pp. 111–116; K. Weber, The accelerated integer GCD algorithm, ACM Trans. Math. Softw. 21 (1995) 111–122] when U and V are close to the same size, that replaces U by (U-bV)/p, where p is one of the prime moduli and b is the unique integer in the interval (-p/2,p/2) such that b=UV ^-1(mod p) . When the algorithm is executed on a bit common CRCW PRAM with O(n log n log log log n) processors, it takes O(n) time in the worst case. A heuristic model of the average case yields O(n/log n) time on the same number of processors
Self-Certified Public Key Cryptographic Methodologies for Resource-Constrained Wireless Sensor Networks
As sensor networks become one of the key technologies to realize ubiquitous computing, security remains a growing concern. Although a wealth of key-generation methods have been developed during the past few decades, they cannot be directly applied to sensor network environments. The resource-constrained characteristics of sensor nodes, the ad-hoc nature of their deployment, and the vulnerability of wireless media pose a need for unique solutions.
A fundamental requisite for achieving security is the ability to provide for data con…dential- ity and node authentication. However, the scarce resources of sensor networks have rendered the direct applicability of existing public key cryptography (PKC) methodologies impractical. Elliptic Curve Cryptography (ECC) has emerged as a suitable public key cryptographic foun- dation for constrained environments, providing strong security for relatively small key sizes.
This work focuses on the clear need for resilient security solutions in wireless sensor networks (WSNs) by introducing e¢ cient PKC methodologies, explicitly designed to accommodate the distinctive attributes of resource-constrained sensor networks. Primary contributions pertain to the introduction of light-weight cryptographic arithmetic operations, and the revision of self- certi…cation (consolidated authentication and key-generation). Moreover, a low-delay group key generation methodology is devised and a denial of service mitigation scheme is introduced. The light-weight cryptographic methods developed pertain to a system-level e¢ cient utilization of the Montgomery procedure and e¢ cient calculations of modular multiplicative inverses. With respect to the latter, computational complexity has been reduced from O(m) to O(logm), with little additional memory cost.
Complementing the theoretical contributions, practical computation o¤-loading protocols have been developed along with a group key establishment scheme. Implementation on state-of- the-art sensor node platforms has yielded a comprehensive key establishment process obtained in approximately 50 ns, while consuming less than 25 mJ. These exciting results help demonstrate the technology developed and ensure its impact on next-generation sensor networks
Cofactorization on Graphics Processing Units
We show how the cofactorization step, a compute-intensive part of the relation collection phase of the number field sieve (NFS), can be farmed out to a graphics processing unit. Our implementation on a GTX 580 GPU, which is integrated with a state-of-the-art NFS implementation, can serve as a cryptanalytic co-processor for several Intel i7-3770K quad-core CPUs simultaneously. This allows those processors to focus on the memory-intensive sieving and results in more useful NFS-relations found in less time
Randomized Mixed-Radix Scalar Multiplication
A covering system of congruences can be defined as a set of congruence
relations of the form: for satisfying the property that for
every integer in , there exists at least an index such that . First, we show that most existing
scalar multiplication algorithms can be formulated in terms of covering
systems of congruences. Then, using a special form of covering systems called
exact \mbox{-covers}, we present a novel uniformly randomized scalar
multiplication algorithm with built-in protections against various types of
side-channel attacks. This algorithm can be an alternative to Coron\u27s scalar
blinding technique for elliptic curves, in particular when the choice of a
particular finite field tailored for speed compels to use a large random
factor
Recommended from our members
A Study of High Performance Multiple Precision Arithmetic on Graphics Processing Units
Multiple precision (MP) arithmetic is a core building block of a wide variety of algorithms in computational mathematics and computer science. In mathematics MP is used in computational number theory, geometric computation, experimental mathematics, and in some random matrix problems. In computer science, MP arithmetic is primarily used in cryptographic algorithms: securing communications, digital signatures, and code breaking. In most of these application areas, the factor that limits performance is the MP arithmetic. The focus of our research is to build and analyze highly optimized libraries that allow the MP operations to be offloaded from the CPU to the GPU. Our goal is to achieve an order of magnitude improvement over the CPU in three key metrics: operations per second per socket, operations per watt, and operation per second per dollar. What we find is that the SIMD design and balance of compute, cache, and bandwidth resources on the GPU is quite different from the CPU, so libraries such as GMP cannot simply be ported to the GPU. New approaches and algorithms are required to achieve high performance and high utilization of GPU resources. Further, we find that low-level ISA differences between GPU generations means that an approach that works well on one generation might not run well on the next.
Here we report on our progress towards MP arithmetic libraries on the GPU in four areas: (1) large integer addition, subtraction, and multiplication; (2) high performance modular multiplication and modular exponentiation (the key operations for cryptographic algorithms) across generations of GPUs; (3) high precision floating point addition, subtraction, multiplication, division, and square root; (4) parallel short division, which we prove is asymptotically optimal on EREW and CREW PRAMs
A library for parallel arithmetic using a modular representation
SIGLEAvailable from British Library Document Supply Centre-DSC:DXN041817 / BLDSC - British Library Document Supply CentreGBUnited Kingdo
On the Analysis of Public-Key Cryptologic Algorithms
The RSA cryptosystem introduced in 1977 by Ron Rivest, Adi Shamir and Len Adleman is the most commonly deployed public-key cryptosystem. Elliptic curve cryptography (ECC) introduced in the mid 80's by Neal Koblitz and Victor Miller is becoming an increasingly popular alternative to RSA offering competitive performance due the use of smaller key sizes. Most recently hyperelliptic curve cryptography (HECC) has been demonstrated to have comparable and in some cases better performance than ECC. The security of RSA relies on the integer factorization problem whereas the security of (H)ECC is based on the (hyper)elliptic curve discrete logarithm problem ((H)ECDLP). In this thesis the practical performance of the best methods to solve these problems is analyzed and a method to generate secure ephemeral ECC parameters is presented. The best publicly known algorithm to solve the integer factorization problem is the number field sieve (NFS). Its most time consuming step is the relation collection step. We investigate the use of graphics processing units (GPUs) as accelerators for this step. In this context, methods to efficiently implement modular arithmetic and several factoring algorithms on GPUs are presented and their performance is analyzed in practice. In conclusion, it is shown that integrating state-of-the-art NFS software packages with our GPU software can lead to a speed-up of 50%. In the case of elliptic and hyperelliptic curves for cryptographic use, the best published method to solve the (H)ECDLP is the Pollard rho algorithm. This method can be made faster using classes of equivalence induced by curve automorphisms like the negation map. We present a practical analysis of their use to speed up Pollard rho for elliptic curves and genus 2 hyperelliptic curves defined over prime fields. As a case study, 4 curves at the 128-bit theoretical security level are analyzed in our software framework for Pollard rho to estimate their practical security level. In addition, we present a novel many-core architecture to solve the ECDLP using the Pollard rho algorithm with the negation map on FPGAs. This architecture is used to estimate the cost of solving the Certicom ECCp-131 challenge with a cluster of FPGAs. Our design achieves a speed-up factor of about 4 compared to the state-of-the-art. Finally, we present an efficient method to generate unique, secure and unpredictable ephemeral ECC parameters to be shared by a pair of authenticated users for a single communication. It provides an alternative to the customary use of fixed ECC parameters obtained from publicly available standards designed by untrusted third parties. The effectiveness of our method is demonstrated with a portable implementation for regular PCs and Android smartphones. On a Samsung Galaxy S4 smartphone our implementation generates unique 128-bit secure ECC parameters in 50 milliseconds on average