21,574 research outputs found

    Enhancing an Embedded Processor Core with a Cryptographic Unit for Performance and Security

    Get PDF
    We present a set of low-cost architectural enhancements to accelerate the execution of certain arithmetic operations common in cryptographic applications on an extensible embedded processor core. The proposed enhancements are generic in the sense that they can be beneficially applied in almost any RISC processor. We implemented the enhancements in form of a cryptographic unit (CU) that offers the programmer an extended instruction set. The CU features a 128-bit wide register file and datapath, which enables it to process 128-bit words and perform 128-bit loads/stores. We analyze the speed-up factors for some arithmetic operations and public-key cryptographic algorithms obtained through these enhancements. In addition, we evaluate the hardware overhead (i.e. silicon area) of integrating the CU into an embedded RISC processor. Our experimental results show that the proposed architectural enhancements allow for a significant performance gain for both RSA and ECC at the expense of an acceptable increase in silicon area. We also demonstrate that the proposed enhancements facilitate the protection of cryptographic algorithms against certain types of side-channel attacks and present an AES implementation hardened against cache-based attacks as a case study

    Implementing a protected zone in a reconfigurable processor for isolated execution of cryptographic algorithms

    Get PDF
    We design and realize a protected zone inside a reconfigurable and extensible embedded RISC processor for isolated execution of cryptographic algorithms. The protected zone is a collection of processor subsystems such as functional units optimized for high-speed execution of integer operations, a small amount of local memory, and general and special-purpose registers. We outline the principles for secure software implementation of cryptographic algorithms in a processor equipped with the protected zone. We also demonstrate the efficiency and effectiveness of the protected zone by implementing major cryptographic algorithms, namely RSA, elliptic curve cryptography, and AES in the protected zone. In terms of time efficiency, software implementations of these three cryptographic algorithms outperform equivalent software implementations on similar processors reported in the literature. The protected zone is designed in such a modular fashion that it can easily be integrated into any RISC processor; its area overhead is considerably moderate in the sense that it can be used in vast majority of embedded processors. The protected zone can also provide the necessary support to implement TPM functionality within the boundary of a processor

    Modular design of data-parallel graph algorithms

    Get PDF
    Amorphous Data Parallelism has proven to be a suitable vehicle for implementing concurrent graph algorithms effectively on multi-core architectures. In view of the growing complexity of graph algorithms for information analysis, there is a need to facilitate modular design techniques in the context of Amorphous Data Parallelism. In this paper, we investigate what it takes to formulate algorithms possessing Amorphous Data Parallelism in a modular fashion enabling a large degree of code re-use. Using the betweenness centrality algorithm, a widely popular algorithm in the analysis of social networks, we demonstrate that a single optimisation technique can suffice to enable a modular programming style without loosing the efficiency of a tailor-made monolithic implementation

    On Designing Multicore-aware Simulators for Biological Systems

    Full text link
    The stochastic simulation of biological systems is an increasingly popular technique in bioinformatics. It often is an enlightening technique, which may however result in being computational expensive. We discuss the main opportunities to speed it up on multi-core platforms, which pose new challenges for parallelisation techniques. These opportunities are developed in two general families of solutions involving both the single simulation and a bulk of independent simulations (either replicas of derived from parameter sweep). Proposed solutions are tested on the parallelisation of the CWC simulator (Calculus of Wrapped Compartments) that is carried out according to proposed solutions by way of the FastFlow programming framework making possible fast development and efficient execution on multi-cores.Comment: 19 pages + cover pag
    • …
    corecore