115 research outputs found

    High-speed polynomial basis multipliers over GF(2^m) for special pentanomials

    Get PDF
    Efficient hardware implementations of arithmetic operations in the Galois field GF(2^m) are highly desirable for several applications, such as coding theory, computer algebra and cryptography. Among these operations, multiplication is of special interest because it is considered the most important building block. Therefore, high-speed algorithms and hardware architectures for computing multiplication are highly required. In this paper, bit-parallel polynomial basis multipliers over the binary field GF(2^m) generated using type II irreducible pentanomials are considered. The multiplier here presented has the lowest time complexity known to date for similar multipliers based on this type of irreducible pentanomials

    High Speed and Low-Complexity Hardware Architectures for Elliptic Curve-Based Crypto-Processors

    Get PDF
    The elliptic curve cryptography (ECC) has been identified as an efficient scheme for public-key cryptography. This thesis studies efficient implementation of ECC crypto-processors on hardware platforms in a bottom-up approach. We first study efficient and low-complexity architectures for finite field multiplications over Gaussian normal basis (GNB). We propose three new low-complexity digit-level architectures for finite field multiplication. Architectures are modified in order to make them more suitable for hardware implementations specially focusing on reducing the area usage. Then, for the first time, we propose a hybrid digit-level multiplier architecture which performs two multiplications together (double-multiplication) with the same number of clock cycles required as the one for one multiplication. We propose a new hardware architecture for point multiplication on newly introduced binary Edwards and generalized Hessian curves. We investigate higher level parallelization and lower level scheduling for point multiplication on these curves. Also, we propose a highly parallel architecture for point multiplication on Koblitz curves by modifying the addition formulation. Several FPGA implementations exploiting these modifications are presented in this thesis. We employed the proposed hybrid multiplier architecture to reduce the latency of point multiplication in ECC crypto-processors as well as the double-exponentiation. This scheme is the first known method to increase the speed of point multiplication whenever parallelization fails due to the data dependencies amongst lower level arithmetic computations. Our comparison results show that our proposed multiplier architectures outperform the counterparts available in the literature. Furthermore, fast computation of point multiplication on different binary elliptic curves is achieved

    Low-Resource and Fast Elliptic Curve Implementations over Binary Edwards Curves

    Get PDF
    Elliptic curve cryptography (ECC) is an ideal choice for low-resource applications because it provides the same level of security with smaller key sizes than other existing public key encryption schemes. For low-resource applications, designing efficient functional units for elliptic curve computations over binary fields results in an effective platform for an embedded co-processor. This thesis investigates co-processor designs for area-constrained devices. Particularly, we discuss an implementation utilizing state of the art binary Edwards curve equations over mixed point addition and doubling. The binary Edwards curve offers the security advantage that it is complete and is, therefore, immune to the exceptional points attack. In conjunction with Montgomery ladder, such a curve is naturally immune to most types of simple power and timing attacks. Finite field operations were performed in the small and efficient Gaussian normal basis. The recently presented formulas for mixed point addition by K. Kim, C. Lee, and C. Negre at Indocrypt 2014 were found to be invalid, but were corrected such that the speed and register usage were maintained. We utilize corrected mixed point addition and doubling formulas to achieve a secure, but still fast implementation of a point multiplication on binary Edwards curves. Our synthesis results over NIST recommended fields for ECC indicate that the proposed co-processor requires about 50% fewer clock cycles for point multiplication and occupies a similar silicon area when compared to the most recent in literature

    Techniques for Efficient Implementation of FIR and Particle Filtering

    Full text link

    Compilers that learn to optimise: a probabilistic machine learning approach

    Get PDF
    Compiler optimisation is the process of making a compiler produce better code, i.e. code that, for example, runs faster on a target architecture. Although numerous program transformations for optimisation have been proposed in the literature, these transformations are not always beneficial and they can interact in very complex ways. Traditional approaches adopted by compiler writers fix the order of the transformations and decide when and how these transformations should be applied to a program by using hard-coded heuristics. However, these heuristics require a lot of time and effort to construct and may sacrifice performance on programs they have not been tuned for.This thesis proposes a probabilistic machine learning solution to the compiler optimisation problem that automatically determines "good" optimisation strategies for programs. This approach uses predictive modelling in order to search the space of compiler transformations. Unlike most previous work that learns when/how to apply a single transformation in isolation or a fixed-order set of transformations, the techniques proposed in this thesis are capable of tackling the general problem of predicting "good" sequences of compiler transformations. This is achieved by exploiting transference across programs with two different techniques: Predictive Search Distributions (PSD) and multi-task Gaussian process prediction (multi-task GP). While the former directly addresses the problem of predicting "good" transformation sequences, the latter learns regression models (or proxies) of the performance of the programs in order to rapidly scan the space of transformation sequences.Both methods, PSD and multi-task GP, are formulated as general machine learning techniques. In particular, the PSD method is proposed in order to speed up search in combinatorial optimisation problems by learning a distribution over good solutions on a set of problem in¬ stances and using that distribution to search the optimisation space of a problem that has not been seen before. Likewise, multi-task GP is proposed as a general method for multi-task learning that directly models the correlation between several machine learning tasks, exploiting the shared information across the tasks.Additionally, this thesis presents an extension to the well-known analysis of variance (ANOVA) methodology in order to deal with sequence data. This extension is used to address the problem of optimisation space characterisation by identifying and quantifying the main effects of program transformations and their interactions.Finally, the machine learning methods proposed are successfully applied to a data set that has been generated as a result of the application of source-to-source transformations to 12 C programs from the UTDSP benchmark suite

    Automated recognition of handwritten mathematics

    Get PDF
    Most software programs that deal with mathematical objects require input expressions to be linearized using somewhat awkward and unfamiliar string-based syntax. It is natural to desire a method for inputting mathematics using the same two-dimensional syntax employed with pen and paper, and the increasing prevalence of pen- and touch-based interfaces causes this topic to be of practical as well as theoretical interest. Accurately recognizing two-dimensional mathematical notation is a difficult problem that requires not only theoretical advancement over the traditional theories of string-based languages, but also careful consideration of runtime efficiency, data organization, and other practical concerns that arise during system construction. This thesis describes the math recognizer used in the MathBrush pen-math system. At a high level, the two-dimensional syntax of mathematical writing is formalized using a relational grammar. Rather than reporting a single recognition result, all recognizable interpretations of the input are simultaneously represented in a data structure called a parse forest. Individual interpretations may be extracted from the forest and reported one by one as the user requests them. These parsing techniques necessitate robust tree scoring functions, which themselves rely on several lower-level recognition processes for stroke grouping, symbol recognition, and spatial relation classification. The thesis covers the recognition, parsing, and scoring aspects of the MathBrush recognizer, as well as the algorithms and assumptions necessary to combine those systems and formalisms together into a useful and efficient software system. The effectiveness of the resulting system is measured through two accuracy evaluations. One evaluation uses a novel metric based on user effort, while the other replicates the evaluation process of an international accuracy competition. The evaluations show that not only is the performance of the MathBrush recognizer improving over time, but it is also significantly more accurate than other academic recognition systems

    Automatic Performance Optimization of Stencil Codes

    Get PDF
    A widely used class of codes are stencil codes. Their general structure is very simple: data points in a large grid are repeatedly recomputed from neighboring values. This predefined neighborhood is the so-called stencil. Despite their very simple structure, stencil codes are hard to optimize since only few computations are performed while a comparatively large number of values have to be accessed, i.e., stencil codes usually have a very low computational intensity. Moreover, the set of optimizations and their parameters also depend on the hardware on which the code is executed. To cut a long story short, current production compilers are not able to fully optimize this class of codes and optimizing each application by hand is not practical. As a remedy, we propose a set of optimizations and describe how they can be applied automatically by a code generator for the domain of stencil codes. A combination of a space and time tiling is able to increase the data locality, which significantly reduces the memory-bandwidth requirements: a standard three-dimensional 7-point Jacobi stencil can be accelerated by a factor of 3. This optimization can target basically any stencil code, while others are more specialized. E.g., support for arbitrary linear data layout transformations is especially beneficial for colored kernels, such as a Red-Black Gauss-Seidel smoother. On the one hand, an optimized data layout for such kernels reduces the bandwidth requirements while, on the other hand, it simplifies an explicit vectorization. Other noticeable optimizations described in detail are redundancy elimination techniques to eliminate common subexpressions both in a sequence of statements and across loop boundaries, arithmetic simplifications and normalizations, and the vectorization mentioned previously. In combination, these optimizations are able to increase the performance not only of the model problem given by Poisson’s equation, but also of real-world applications: an optical flow simulation and the simulation of a non-isothermal and non-Newtonian fluid flow

    Automated Design Space Exploration and Datapath Synthesis for Finite Field Arithmetic with Applications to Lightweight Cryptography

    Get PDF
    Today, emerging technologies are reaching astronomical proportions. For example, the Internet of Things has numerous applications and consists of countless different devices using different technologies with different capabilities. But the one invariant is their connectivity. Consequently, secure communications, and cryptographic hardware as a means of providing them, are faced with new challenges. Cryptographic algorithms intended for hardware implementations must be designed with a good trade-off between implementation efficiency and sufficient cryptographic strength. Finite fields are widely used in cryptography. Examples of algorithm design choices related to finite field arithmetic are the field size, which arithmetic operations to use, how to represent the field elements, etc. As there are many parameters to be considered and analyzed, an automation framework is needed. This thesis proposes a framework for automated design, implementation and verification of finite field arithmetic hardware. The underlying motif throughout this work is “math meets hardware”. The automation framework is designed to bring the awareness of underlying mathematical structures to the hardware design flow. It is implemented in GAP, an open source computer algebra system that can work with finite fields and has symbolic computation capabilities. The framework is roughly divided into two phases, the architectural decisions and the automated design genera- tion. The architectural decisions phase supports parameter search and produces a list of candidates. The automated design generation phase is invoked for each candidate, and the generated VHDL files are passed on to conventional synthesis tools. The candidates and their implementation results form the design space, and the framework allows rapid design space exploration in a systematic way. In this thesis, design space exploration is focused on finite field arithmetic. Three distinctive features of the proposed framework are the structure of finite fields, tower field support, and on the fly submodule generation. Each finite field used in the design is represented as both a field and its corresponding vector space. It is easy for a designer to switch between fields and vector spaces, but strict distinction of the two is necessary for hierarchical designs. When an expression is defined over an extension field, the top-level module contains element signals and submodules for arithmetic operations on those signals. The submodules are generated with corresponding vector signals and the arithmetic operations are now performed on the coordinates. For tower fields, the submodules are generated for the subfield operations, and the design is generated in a top-down fashion. The binding of expressions to the appropriate finite fields or vector spaces and a set of customized methods allow the on the fly generation of expressions for implementation of arithmetic operations, and hence submodule generation. In the light of NIST Lightweight Cryptography Project (LWC), this work focuses mainly on small finite fields. The thesis illustrates the impact of hardware implementation results during the design process of WAGE, a Round 2 candidate in the NIST LWC standardization competition. WAGE is a hardware oriented authenticated encryption scheme. The parameter selection for WAGE was aimed at balancing the security and hardware implementation area, using hardware implementation results for many design decisions, for example field size, representation of field elements, etc. In the proposed framework, the components of WAGE are used as an example to illustrate different automation flows and demonstrate the design space exploration on a real-world algorithm

    The design and implementation of a relational programming system.

    Get PDF
    The declarative class of computer languages consists mainly of two paradigms - the logic and the functional. Much research has been devoted in recent years to the integration of the two with the aim of securing the advantages of both without retaining their disadvantages. To date this research has, arguably, been less fruitful than initially hoped. A large number of composite functional/logical languages have been proposed but have generally been marred by the lack of a firm, cohesive, mathematical basis. More recently new declarative paradigms, equational and constraint languages, have been advocated. These however do not fully encompass those features we perceive as being central to functional and logic languages. The crucial functional features are higher-order definitions, static polymorphic typing, applicative expressions and laziness. The crucial logic features are ability to reason about both functional and non-functional relationships and to handle computations involving search. This thesis advocates a new declarative paradigm which lies midway between functional and logic languages - the so-called relational paradigm. In a relationallanguage program and data alike are denoted by relations. All expressions are relations constructed from simpler expressions using operators which form a relational algebra. The impetus for use of relations in a declarative language comes from observations concerning their connection to functional and logic programming. Relations are mathematically more general than functions modelling non-functional as well as functional relationships. They also form the basis of many logic languages, for example, Prolog. This thesis proposes a new relational language based entirely on binary relations, named Drusilla. We demonstrate the functional and logic aspects of Drusilla. It retains the higher-order objects and polymorphism found in modern functional languages but handles non-determinism and models relationships between objects in the manner of a logic language with notion of algorithm being composed of logic and control elements. Different programming styles - functional, logic and relational- are illustrated. However, such expressive power does not come for free; it has associated with it a high cost of implementation. Two main techniques are used in the necessarily complex language interpreter. A type inference system checks programs to ensure they are meaningful and simultaneously performs automatic representation selection for relations. A symbolic manipulation system transforms programs to improve. efficiency of expressions and to increase the number of possible representations for relations while preserving program meaning
    corecore