183 research outputs found

    Design of a Modular Exponentiation Module for an RSA Cryptographic Coprocessor with Power Analysis Countermeasures

    Get PDF
    Rivest-Shamir-Adleman (RSA) is a widely used public key cryptographic method. The main operation performed in this method, for encryption and decryption, is modular exponentiation. The way modular exponentiation is computed make the system vulnerable to sidechannel attacks. Side-channel attacks focus on the physical implementation rather than in the algorithms vulnerabilities. In particular, power analysis attacks are a type of sidechannel attack that focuses on extracting information from the power consumption trace. The main thesis goals are to design, verify and obtain the specifications of a Simple Power Analysis (SPA) resistant coprocessor. A coprocessor and the hardware design are introduced because the case of study in this thesis requires a fast implementation of the RSA method. The proposed design work with 4096-bit keys, following the recommendations of NIST Special Publication 800-57 Part 1. Thus, the design focuses on area optimization while dealing with large keys. This design is presented in an easy-going schematic form, but, the fully functional version is presented using the hardware description language VHDL. By using Cadence ® software, the design is simulated and the implemented countermeasures are verified with a 16-bit version. These proposed countermeasures seek not to increase power consumption or execution time. In order to compare against an SPA vulnerable system, this reference version is also designed and simulated. The power traces for both versions are obtained to assess the effectiveness of the applied countermeasure. In order to get realistic results, the design has been synthesized in a 1.2V standard 65 nm CMOS library. The final proposed solution manages the area problem by using only one 4098-bit adder / subtractor into a Montgomery Product (MP) sequential scheme. This adder / subtractor is a type of Parallel Prefix Adder (PPA), in order to reduce delay. In particular, Ladner-Fischer topology is used. This reduces the number of wire tracks and logic levels, which help to synthesize this kind of huge adder. The specifications obtained for the 4096-bit version allow the main system clock to run at about 100 MHz. In the SPA resistant version, this means a modular exponentiation can be computed, in average, in about 504 ms

    Applications of Artificial Intelligence to Cryptography

    Get PDF
    This paper considers some recent advances in the field of Cryptography using Artificial Intelligence (AI). It specifically considers the applications of Machine Learning (ML) and Evolutionary Computing (EC) to analyze and encrypt data. A short overview is given on Artificial Neural Networks (ANNs) and the principles of Deep Learning using Deep ANNs. In this context, the paper considers: (i) the implementation of EC and ANNs for generating unique and unclonable ciphers; (ii) ML strategies for detecting the genuine randomness (or otherwise) of finite binary strings for applications in Cryptanalysis. The aim of the paper is to provide an overview on how AI can be applied for encrypting data and undertaking cryptanalysis of such data and other data types in order to assess the cryptographic strength of an encryption algorithm, e.g. to detect patterns of intercepted data streams that are signatures of encrypted data. This includes some of the authors’ prior contributions to the field which is referenced throughout. Applications are presented which include the authentication of high-value documents such as bank notes with a smartphone. This involves using the antenna of a smartphone to read (in the near field) a flexible radio frequency tag that couples to an integrated circuit with a non-programmable coprocessor. The coprocessor retains ultra-strong encrypted information generated using EC that can be decrypted on-line, thereby validating the authenticity of the document through the Internet of Things with a smartphone. The application of optical authentication methods using a smartphone and optical ciphers is also briefly explored

    Vector processor virtualization: distributed memory hierarchy and simultaneous multithreading

    Get PDF
    Taking advantage of DLP (Data-Level Parallelism) is indispensable in most data streaming and multimedia applications. Several architectures have been proposed to improve both the performance and energy consumption for such applications. Superscalar and VLIW (Very Long Instruction Word) processors, along with SIMD (Single-Instruction Multiple-Data) and vector processor (VP) accelerators, are among the available options for designers to accomplish their desired requirements. On the other hand, these choices turn out to be large resource and energy consumers, while also not being always used efficiently due to data dependencies among instructions and limited portion of vectorizable code in single applications that deploy them. This dissertation proposes an innovative architecture for a multithreaded VP which separates the path for performing data shuffle and memory-indexed accesses from the data path for executing other vector instructions that access the memory. This separation speeds up the most common memory access operations by avoiding extra delays and unnecessary stalls. In this multilane-based VP design, each vector lane uses its own private memory to avoid any stalls during memory access instructions. More importantly, the proposed VP has an innovative multithreaded architecture which makes it highly suitable for concurrent sharing in multicore environments. To this end, the VP which is developed in VHDL and prototyped on an FPGA (Field-Programmable Gate Array), serves as a coprocessor for one or more scalar cores in various system architectures presented in the dissertation. In the first system architecture, the VP is allocated exclusively to a single scalar core. Benchmarking shows that the VP can achieve very high performance. The inclusion of distributed data shuffle engines across vector lanes has a spectacular impact on the execution time, primarily for applications like FFT (Fast-Fourier Transform) that require large amounts of data shuffling. In the second system architecture, a VP virtualization technique is presented which, when applied, enables the multithreaded VP to simultaneously execute many threads of various vector lengths. The threads compete simultaneously for the VP resources having as a goal an improved aggregate VP utilization. This approach yields high VP utilization even under low utilization for the individual threads. A vector register file (VRF) virtualization technique dynamically allocates physical vector registers to running threads. The technique is implemented for a multi-core processor embedded in an FPGA. Under the dynamic creation of threads, benchmarking demonstrates large VP speedups and drastic energy savings when compared to the first system architecture. In the last system architecture, further improvements focus on VP virtualization relying exclusively on hardware. Moreover, a pipelined data shuffle network replaces the non-pipelined shuffle engines. The VP can then take advantage of identical instruction flows that may be present in different vector applications by running in a fused instruction mode that increases its utilization. A power dissipation model is introduced as well as two optimization policies towards minimizing the consumed energy, or the product of the energy and runtime for a given application. Benchmarking shows the positive impact of these optimizations

    Easing parallel programming on heterogeneous systems

    Get PDF
    El modo más frecuente de resolver aplicaciones de HPC (High performance Computing) en tiempos de ejecución razonables y de una forma escalable es mediante el uso de sistemas de cómputo paralelo. La tendencia actual en los sistemas de HPC es la inclusión en la misma máquina de ejecución de varios dispositivos de cómputo, de diferente tipo y arquitectura. Sin embargo, su uso impone al programador retos específicos. Un programador debe ser experto en las herramientas y abstracciones existentes para memoria distribuida, los modelos de programación para sistemas de memoria compartida, y los modelos de programación específicos para para cada tipo de co-procesador, con el fin de crear programas híbridos que puedan explotar eficientemente todas las capacidades de la máquina. Actualmente, todos estos problemas deben ser resueltos por el programador, haciendo así la programación de una máquina heterogénea un auténtico reto. Esta Tesis trata varios de los problemas principales relacionados con la programación en paralelo de los sistemas altamente heterogéneos y distribuidos. En ella se realizan propuestas que resuelven problemas que van desde la creación de códigos portables entre diferentes tipos de dispositivos, aceleradores, y arquitecturas, consiguiendo a su vez máxima eficiencia, hasta los problemas que aparecen en los sistemas de memoria distribuida relacionados con las comunicaciones y la partición de estructuras de datosDepartamento de Informática (Arquitectura y Tecnología de Computadores, Ciencias de la Computación e Inteligencia Artificial, Lenguajes y Sistemas Informáticos)Doctorado en Informátic

    Neural network computing using on-chip accelerators

    Get PDF
    The use of neural networks, machine learning, or artificial intelligence, in its broadest and most controversial sense, has been a tumultuous journey involving three distinct hype cycles and a history dating back to the 1960s. Resurgent, enthusiastic interest in machine learning and its applications bolsters the case for machine learning as a fundamental computational kernel. Furthermore, researchers have demonstrated that machine learning can be utilized as an auxiliary component of applications to enhance or enable new types of computation such as approximate computing or automatic parallelization. In our view, machine learning becomes not the underlying application, but a ubiquitous component of applications. This view necessitates a different approach towards the deployment of machine learning computation that spans not only hardware design of accelerator architectures, but also user and supervisor software to enable the safe, simultaneous use of machine learning accelerator resources. In this dissertation, we propose a multi-transaction model of neural network computation to meet the needs of future machine learning applications. We demonstrate that this model, encompassing a decoupled backend accelerator for inference and learning from hardware and software for managing neural network transactions can be achieved with low overhead and integrated with a modern RISC-V microprocessor. Our extensions span user and supervisor software and data structures and, coupled with our hardware, enable multiple transactions from different address spaces to execute simultaneously, yet safely. Together, our system demonstrates the utility of a multi-transaction model to increase energy efficiency improvements and improve overall accelerator throughput for machine learning applications

    Current-voltage characteristics of TaSi2/Si and MOS devices using Labview

    Get PDF
    Analyses of current-voltage (LV) characteristics of Schottky Barrier Diodes (Tantalum Suicide) and Metal Oxide Semiconductor (MOS) Devices, using LabVIEWTM, has been presented here. LabVIEWTMTM, a graphical program development application, has been used to program a computer-driven Keithley Source Measure Unit (SMU) for device characterization. The SMU, which can be used as a Source Voltage - Measure Current as well as Source Current - Measure Voltage instrument, is used in the Source Voltage -Measure Current mode in this study. A General Purpose Interface Bus (GPIB) IEEE 488.2 has been used to interface the SMU with LabVIEWTMTM. LabVIEWTM has been successfully implemented to obtain the current-voltage characteristics of semiconductor devices, such as TaSi2 /Si and MOS structures. Based on this characterization, factors such as the barrier height for TaSi2 /Si and current conduction mechanisms in MOS device structures have been evaluated
    corecore