66 research outputs found

    Efficient software implementation of elliptic curves and bilinear pairings

    Get PDF
    Orientador: Júlio César Lopez HernándezTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O advento da criptografia assimétrica ou de chave pública possibilitou a aplicação de criptografia em novos cenários, como assinaturas digitais e comércio eletrônico, tornando-a componente vital para o fornecimento de confidencialidade e autenticação em meios de comunicação. Dentre os métodos mais eficientes de criptografia assimétrica, a criptografia de curvas elípticas destaca-se pelos baixos requisitos de armazenamento para chaves e custo computacional para execução. A descoberta relativamente recente da criptografia baseada em emparelhamentos bilineares sobre curvas elípticas permitiu ainda sua flexibilização e a construção de sistemas criptográficos com propriedades inovadoras, como sistemas baseados em identidades e suas variantes. Porém, o custo computacional de criptossistemas baseados em emparelhamentos ainda permanece significativamente maior do que os assimétricos tradicionais, representando um obstáculo para sua adoção, especialmente em dispositivos com recursos limitados. As contribuições deste trabalho objetivam aprimorar o desempenho de criptossistemas baseados em curvas elípticas e emparelhamentos bilineares e consistem em: (i) implementação eficiente de corpos binários em arquiteturas embutidas de 8 bits (microcontroladores presentes em sensores sem fio); (ii) formulação eficiente de aritmética em corpos binários para conjuntos vetoriais de arquiteturas de 64 bits e famílias mais recentes de processadores desktop dotadas de suporte nativo à multiplicação em corpos binários; (iii) técnicas para implementação serial e paralela de curvas elípticas binárias e emparelhamentos bilineares simétricos e assimétricos definidos sobre corpos primos ou binários. Estas contribuições permitiram obter significativos ganhos de desempenho e, conseqüentemente, uma série de recordes de velocidade para o cálculo de diversos algoritmos criptográficos relevantes em arquiteturas modernas que vão de sistemas embarcados de 8 bits a processadores com 8 coresAbstract: The development of asymmetric or public key cryptography made possible new applications of cryptography such as digital signatures and electronic commerce. Cryptography is now a vital component for providing confidentiality and authentication in communication infra-structures. Elliptic Curve Cryptography is among the most efficient public-key methods because of its low storage and computational requirements. The relatively recent advent of Pairing-Based Cryptography allowed the further construction of flexible and innovative cryptographic solutions like Identity-Based Cryptography and variants. However, the computational cost of pairing-based cryptosystems remains significantly higher than traditional public key cryptosystems and thus an important obstacle for adoption, specially in resource-constrained devices. The main contributions of this work aim to improve the performance of curve-based cryptosystems, consisting of: (i) efficient implementation of binary fields in 8-bit microcontrollers embedded in sensor network nodes; (ii) efficient formulation of binary field arithmetic in terms of vector instructions present in 64-bit architectures, and on the recently-introduced native support for binary field multiplication in the latest Intel microarchitecture families; (iii) techniques for serial and parallel implementation of binary elliptic curves and symmetric and asymmetric pairings defined over prime and binary fields. These contributions produced important performance improvements and, consequently, several speed records for computing relevant cryptographic algorithms in modern computer architectures ranging from embedded 8-bit microcontrollers to 8-core processorsDoutoradoCiência da ComputaçãoDoutor em Ciência da Computaçã

    Elliptic Curve Cryptography on Modern Processor Architectures

    Get PDF
    Abstract Elliptic Curve Cryptography (ECC) has been adopted by the US National Security Agency (NSA) in Suite "B" as part of its "Cryptographic Modernisation Program ". Additionally, it has been favoured by an entire host of mobile devices due to its superior performance characteristics. ECC is also the building block on which the exciting field of pairing/identity based cryptography is based. This widespread use means that there is potentially a lot to be gained by researching efficient implementations on modern processors such as IBM's Cell Broadband Engine and Philip's next generation smart card cores. ECC operations can be thought of as a pyramid of building blocks, from instructions on a core, modular operations on a finite field, point addition & doubling, elliptic curve scalar multiplication to application level protocols. In this thesis we examine an implementation of these components for ECC focusing on a range of optimising techniques for the Cell's SPU and the MIPS smart card. We show significant performance improvements that can be achieved through of adoption of EC

    Theory and Practice of Cryptography and Network Security Protocols and Technologies

    Get PDF
    In an age of explosive worldwide growth of electronic data storage and communications, effective protection of information has become a critical requirement. When used in coordination with other tools for ensuring information security, cryptography in all of its applications, including data confidentiality, data integrity, and user authentication, is a most powerful tool for protecting information. This book presents a collection of research work in the field of cryptography. It discusses some of the critical challenges that are being faced by the current computing world and also describes some mechanisms to defend against these challenges. It is a valuable source of knowledge for researchers, engineers, graduate and doctoral students working in the field of cryptography. It will also be useful for faculty members of graduate schools and universities

    Doctor of Philosophy in Computer Science

    Get PDF
    dissertationControl-flow analysis of higher-order languages is a difficult problem, yet an important one. It aids in enabling optimizations, improved reliability, and improved security of programs written in these languages. This dissertation explores three techniques to improve the precision and speed of a small-step abstract interpreter: using a priority work list, environment unrolling, and strong function call. In an abstract interpreter, the interpreter is no longer deterministic and choices can be made in how the abstract state space is explored and trade-offs exist. A priority queue is one option. There are also many ways to abstract the concrete interpreter. Environment unrolling gives a slightly different approach than is usually taken, by holding off abstraction in order to gain precision, which can lead to a faster analysis. Strong function call is an approach to clean up some of the imprecision when making a function call that is introduced when abstractly interpreting a program. An alternative approach to building an abstract interpreter to perform static analysis is through the use of constraint solving. Existing techniques to do this have been developed over the last several decades. This dissertation maps these constraints to three different problems, allowing control-flow analysis of higher-order languages to be solved with tools that are already mature and well developed. The control-flow problem is mapped to pointer analysis of first-order languages, SAT, and linear-algebra operations. These mappings allow for fast and parallel implementations of control-flow analysis of higher-order languages. A recent development in the field of static analysis has been pushdown control-flow analysis, which is able to precisely match calls and returns, a weakness in the existing techniques. This dissertation also provides an encoding of pushdown control-flow analysis to linear-algebra operations. In the process, it demonstrates that under certain conditions (monovariance and flow insensitivity) that in terms of precision, a pushdown control-flow analysis is in fact equivalent to a direct style constraint-based formulation

    Hardware processors for pairing-based cryptography

    Get PDF
    Bilinear pairings can be used to construct cryptographic systems with very desirable properties. A pairing performs a mapping on members of groups on elliptic and genus 2 hyperelliptic curves to an extension of the finite field on which the curves are defined. The finite fields must, however, be large to ensure adequate security. The complicated group structure of the curves and the expensive field operations result in time consuming computations that are an impediment to the practicality of pairing-based systems. The Tate pairing can be computed efficiently using the ɳT method. Hardware architectures can be used to accelerate the required operations by exploiting the parallelism inherent to the algorithmic and finite field calculations. The Tate pairing can be performed on elliptic curves of characteristic 2 and 3 and on genus 2 hyperelliptic curves of characteristic 2. Curve selection is dependent on several factors including desired computational speed, the area constraints of the target device and the required security level. In this thesis, custom hardware processors for the acceleration of the Tate pairing are presented and implemented on an FPGA. The underlying hardware architectures are designed with care to exploit available parallelism while ensuring resource efficiency. The characteristic 2 elliptic curve processor contains novel units that return a pairing result in a very low number of clock cycles. Despite the more complicated computational algorithm, the speed of the genus 2 processor is comparable. Pairing computation on each of these curves can be appealing in applications with various attributes. A flexible processor that can perform pairing computation on elliptic curves of characteristic 2 and 3 has also been designed. An integrated hardware/software design and verification environment has been developed. This system automates the procedures required for robust processor creation and enables the rapid provision of solutions for a wide range of cryptographic applications

    Transputer Implementation for the Shell Model and Sd Shell Calculations

    Get PDF
    This thesis consists of two parts. The first part discusses a new Shell model implementation based on communicating sequential processes. The second part contains different shell model calculations, which have been done using an earlier implementation. Sequential processing computers appear to be fast reaching their upper limits of efficiency. Presently they can perform one machine operation in every clock cycle and the silicon technology also seems to have reached its physical limits of miniaturization. Hence new software/hardware approaches should be investigated in order to meet growing computational requirements. Parallel processing has been demonstrated to be one alternative to achieve this objective. But the major problem with this approach is that many algorithms used for the solution of physical problems are not suitable for distribution over a number of processors. In part one of this work we have identified this concurrency in the shell model calculations and implemented it on the Meiko Computing Surface. Firstly we have explained the motivation for this project and then give a detailed comparison of different hardware/software that has been available to us and reasons for our preferred choice. Similarly, we also outline the advantages/disadvantages of the available parallel/sequential languages before choosing parallel C to be our language of implementation. We describe our new serial implementation DASS, the Dynamic And Structured Shell model, which forms basis for the parallel version. We have developed a new algorithm for the phase calculation of Slater Determinants, which is, superior to the previously used occupancy representation method. Both our serial and parallel implementations have adopted this representation. The PARALLEL GLASNAST, as we call it, PARALLEL GLASgow Nuclear Algorithmic Technique, is our complete implementation of the inherent parallelism in Shell model calculation and has been described in detail. It is actually based on splitting the whole calculation into three tasks, which can be distributed on the number of processors required by the chosen topology, and executed concurrently. We also give a detailed discussion of the communication/ synchronization protocols which preserve the available concurrency. We have achieved a complete overlap of the the main tasks, one responsible for arithmetically intensive operations and the other doing searching among, possibly, millions of states. It demonstrates that the implementation of these tasks has got enough built in flexibility that they could be run on any number of processors. Execution times for one and three transputers have been obtained for 28Si, which are fairly good. We have also undertaken a detailed analysis of how the amount of communication (traffic) between processors changes with the increase in the number of states. Part two describes shell model calculations for mass 21 nuclei. Previous many calculations have not taken into account the Coulomb's interaction, which is responsible for differences between mirror nuclei. They also do not use the valuable information on nucleon occupancies. We have made extensive calculations for the six isobars in mass 21 using CWC, PW and USD interactions. The results obtained in this case include, energy, spin, isospin and electromagnetic transition rates. These result are discussed and conclusions drawn. We concentrate on the comparison of the properties in of each mirror pairs. This comparison is supplemented by tables, energy level diagrams and occupancy diagrams. As we consider mirror pair individually, the mixing of states, which is caused by the short range nuclear force and the Coulomb force, becomes more evident. The other important thing we have noticed is, that some pairs of states swap their places, between a mirror pair, on the occupancy diagram, suggesting that their wave functions might have been swapped. We have undertaken a detailed study to discover any swapping states. The tests applied to confirm this include comparison of energy, electromagnetic properties and the occupancy information obtained with different interactions. We find that only the 91, 92 states in Al have swapped over. We also report some real energy gaps which exist on the basis of our calculations for Al

    GPU-based Parallel Computing Models and Implementations for Two-party Privacy-preserving Protocols

    Get PDF
    In (two-party) privacy-preserving-based applications, two users use encrypted inputs to compute a function without giving out plaintext of their input values. Privacy-preserving computing algorithms have to utilize a large amount of computing resources to handle the encryption-decryption operations. In this dissertation, we study optimal utilization of computing resources on the graphic processor unit (GPU) architecture for privacy-preserving protocols based on secure function evaluation (SFE) and the Elliptic Curve Cryptographic (ECC) and related algorithms. A number of privacy-preserving protocols are implemented, including private set intersection (PSI), secret handshaking (SH), secure Edit distance (ED) and Smith-Waterman (SW) problems. PSI is chosen to represent ECC point multiplication related computations, SH for bilinear pairing, and the last two for SFE-based dynamic programming (DP) problems. They represent different types of computations, so that in-depth understanding of the benefits and limitations of the GPU architecture for privacy preserving protocols is gained. For SFE-based ED and SW problems, a wavefront parallel computing model on the CPU-GPU architecture under the semi-honest security model is proposed. Low level parallelization techniques for GPU-based gate (de-)garbler, synchronized parallel memory access, pipelining, and general GPU resource mapping policies are developed. This dissertation shows that the GPU architecture can be fully utilized to speed up SFE-based ED and SW algorithms, which are constructed with billions of garbled gates, on a contemporary GPU card GTX-680, with very little waste of processing cycles or memory space. For PSI and SH protocols and underlying ECC algorithms, the analysis in this research shows that the conventional Montgomery-based number system is more friendly to the GPU architecture than the Residue Number System (RNS) is. Analysis on experiment results further shows that the lazy reduction in higher extension fields can have performance benefits only when the GPU architecture has enough fast memory. The resulting Elliptic curve Arithmetic GPU Library (EAGL) can run 3350.9 R-ate (bilinear) pairing/sec, and 47000 point multiplication/sec at the 128-bit security level, on one GTX-680 card. The primary performance bottleneck is found to be lacking of advanced memory management functions in the contemporary GPU architecture for bilinear pairing operations. Substantial performance gain can be expected when the on-chip memory size and/or more advanced memory prefetching mechanisms are supported in future generations of GPUs

    High-Speed Elliptic Curve and Pairing-Based Cryptography

    Get PDF
    Elliptic Curve Cryptography (ECC), independently proposed by Miller [Mil86] and Koblitz [Kob87] in mid 80’s, is finding momentum to consolidate its status as the public-key system of choice in a wide range of applications and to further expand this position to settings traditionally occupied by RSA and DL-based systems. The non-existence of known subexponential attacks on this cryptosystem directly translates to shorter keylengths for a given security level and, consequently, has led to implementations with better bandwidth usage, reduced power and memory requirements, and higher speeds. Moreover, the dramatic entry of pairing-based cryptosystems defined on elliptic curves at the beginning of the new millennium has opened the possibility of a plethora of innovative applications, solving in some cases longstanding problems in cryptography. Nevertheless, public-key cryptography (PKC) is still relatively expensive in comparison with its symmetric-key counterpart and it remains an open challenge to reduce further the computing cost of the most time-consuming PKC primitives to guarantee their adoption for secure communication in commercial and Internet-based applications. The latter is especially true for pairing computations. Thus, it is of paramount importance to research methods which permit the efficient realization of Elliptic Curve and Pairing-based Cryptography on the several new platforms and applications. This thesis deals with efficient methods and explicit formulas for computing elliptic curve scalar multiplication and pairings over fields of large prime characteristic with the objective of enabling the realization of software implementations at very high speeds. To achieve this main goal in the case of elliptic curves, we accomplish the following tasks: identify the elliptic curve settings with the fastest arithmetic; accelerate the precomputation stage in the scalar multiplication; study number representations and scalar multiplication algorithms for speeding up the evaluation stage; identify most efficient field arithmetic algorithms and optimize them; analyze the architecture of the targeted platforms for maximizing the performance of ECC operations; identify most efficient coordinate systems and optimize explicit formulas; and realize implementations on x86-64 processors with an optimal algorithmic selection among all studied cases. In the case of pairings, the following tasks are accomplished: accelerate tower and curve arithmetic; identify most efficient tower and field arithmetic algorithms and optimize them; identify the curve setting with the fastest arithmetic and optimize it; identify state-of-the-art techniques for the Miller loop and final exponentiation; and realize an implementation on x86-64 processors with optimal algorithmic selection. The most outstanding contributions that have been achieved with the methodologies above in this thesis can be summarized as follows: • Two novel precomputation schemes are introduced and shown to achieve the lowest costs in the literature for different curve forms and scalar multiplication primitives. The detailed cost formulas of the schemes are derived for most relevant scenarios. • A new methodology based on the operation cost per bit to devise highly optimized and compact multibase algorithms is proposed. Derived multibase chains using bases {2,3} and {2,3,5} are shown to achieve the lowest theoretical costs for scalar multiplication on certain curve forms and for scenarios with and without precomputations. In addition, the zero and nonzero density formulas of the original (width-w) multibase NAF method are derived by using Markov chains. The application of “fractional” windows to the multibase method is described together with the derivation of the corresponding density formulas. • Incomplete reduction and branchless arithmetic techniques are optimally combined for devising high-performance field arithmetic. Efficient algorithms for “small” modular operations using suitably chosen pseudo-Mersenne primes are carefully analyzed and optimized for incomplete reduction. • Data dependencies between contiguous field operations are discovered to be a source of performance degradation on x86-64 processors. Three techniques for reducing the number of potential pipeline stalls due to these dependencies are proposed: field arithmetic scheduling, merging of point operations and merging of field operations. • Explicit formulas for two relevant cases, namely Weierstrass and Twisted Edwards curves over and , are carefully optimized employing incomplete reduction, minimal number of operations and reduced number of data dependencies between contiguous field operations. • Best algorithms for the field, point and scalar arithmetic, studied or proposed in this thesis, are brought together to realize four high-speed implementations on x86-64 processors at the 128-bit security level. Presented results set new speed records for elliptic curve scalar multiplication and introduce up to 34% of cost reduction in comparison with the best previous results in the literature. • A generalized lazy reduction technique that enables the elimination of up to 32% of modular reductions in the pairing computation is proposed. Further, a methodology that keeps intermediate results under Montgomery reduction boundaries maximizing operations without carry checks is introduced. Optimized formulas for the popular tower are explicitly stated and a detailed operation count that permits to determine the theoretical cost improvement attainable with the proposed method is carried out for the case of an optimal ate pairing on a Barreto-Naehrig (BN) curve at the 128-bit security level. • Best algorithms for the different stages of the pairing computation, including the proposed techniques and optimizations, are brought together to realize a high-speed implementation at the 128-bit security level. Presented results on x86-64 processors set new speed records for pairings, introducing up to 34% of cost reduction in comparison with the best published result. From a general viewpoint, the proposed methods and optimized formulas have a practical impact in the performance of cryptographic protocols based on elliptic curves and pairings in a wide range of applications. In particular, the introduced implementations represent a direct and significant improvement that may be exploited in performance-dominated applications such as high-demand Web servers in which millions of secure transactions need to be generated
    corecore