15 research outputs found

    A fully pipelined memoryless 17.8 Gbps AES-128 encryptor

    Get PDF
    A fully pipelined implementation of the Advanced Encryption Stan-dard encryption algorithm with 128-bit input and key length (AES-128) was implemented on Xilinx ’ Virtex-E and Virtex-II devices. The design is called SIG-AES-E and it implements the S-boxes combinatorially and thus requires no internal memory. It is con-cluded, that SIG-AES-E is faster than other published FPGA-based implementations of the AES-128 encryption algorithm. Categories and Subject Descriptor

    Optimized Architecture for AES

    Get PDF
    This paper presents a highly optimized architecture for Advanced Encryption Standard (AES) by dividing and merging (combining) different sub operations in AES algorithm. The proposed architecture uses ten levels of pipelining to achieve higher throughput and uses Block-RAM utility to reduce slice utilization which subsequently increases the efficiency. It achieves the data stream of 57 Gbps at 451 MHz working frequency and obtains 36% improvement in efficiency to the best known similar design throughput per area (Throughput/Area) and 35% smaller in slice area. This architecture can easily be embedded with other modules because of significantly reduced slice utilization

    PIPELINED DATA PARALLEL MODEL OF ADVANCED ENCRYPTION STANDARD ALGORITHM

    Get PDF
    The Advanced Encryption Standard (AES) was officially adopted in 2002 as the new encryption standard algorithm. AES specifies a FIPS-approved cryptographic algorithm that can be used to protect electronic data. It is a symmetric block cipher that can encrypt and decrypt information. This paper develops a pipelined data parallel model of AES. The parallelism in the algorithm is two dimensional. The first dimension is AES enter-stage (pipelining) and the second dimension is data parallelism. Pipelining parallelism exploits the availability of several processes to execute different stages of different data blocks in parallel. The data parallelism exploits data independence among data blocks to implement data level parallelism. The parallel implementation of AES decreases the time needed for encryption and decryption processes. We use the ECB mode in encryption/decryption algorithm in our parallel implementation of AES to implement the parallelization at data level where data blocks are encrypted and decrypted in parallel. We also develop an MPI-based algorithm to be used with a cluster of workstations (COW). We validate the approach by simulating the model with various input parameters (input data file size, number of processes, communication/computation operation execution time, etc.) and measuring the corresponding performance. Performance metrics include speedup, communication to computation ratio and efficiency. Results show that performance obtained by the developed model is superior to parallel implementations of AES which include only data parallelism or pipelining

    A reconfigurable and scalable efficient architecture for AES

    Get PDF
    ix, 77 leaves : ill. ; 29 cm.A new 32-bit reconfigurable FPGA implementation of AES algorithm is presented in this thesis. It employs a single round architecture to minimize the hardware cost. The combinational logic implementation of S-Box ensures the suitability for non-Block RAMs (BRAMs) FPGA devices. Fully composite field GF((24)2) based encryption and keyschedule lead to the lower hardware complexity and convenience for the efficient subpipelining. For the first time, a subpipelined on-the-fly keyschedule over composite field GF((24)2) is applied for the all standard key sizes (128-, 192-, 256-bit). The proposed architecture achieves a throughput of 805.82Mbits/s using 523 slices with a ratio throughput/slice of 1.54Mbps/Slice on Xilinx Virtex2 XC2V2000 ff896 device

    Cryptarray A Scalable And Reconfigurable Architecture For Cryptographic Applications

    Get PDF
    Cryptography is increasingly viewed as a critical technology to fulfill the requirements of security and authentication for information exchange between Internet applications. However, software implementations of cryptographic applications are unable to support the quality of service from a bandwidth perspective required by most Internet applications. As a result, various hardware implementations, from Application-Specific Integrated Circuits (ASICs), Field-Programmable Gate Arrays (FPGAs), to programmable processors, were proposed to improve this inadequate quality of service. Although these implementations provide performances that are considered better than those produced by software implementations, they still fall short of addressing the bandwidth requirements of most cryptographic applications in the context of the Internet for two major reasons: (i) The majority of these architectures sacrifice flexibility for performance in order to reach the performance level needed for cryptographic applications. This lack of flexibility can be detrimental considering that cryptographic standards and algorithms are still evolving. (ii) These architectures do not consider the consequences of technology scaling in general, and particularly interconnect related problems. As a result, this thesis proposes an architecture that attempts to address the requirements of cryptographic applications by overcoming the obstacles described in (i) and (ii). To this end, we propose a new reconfigurable, two-dimensional, scalable architecture, called CRYPTARRAY, in which bus-based communication is replaced by distributed shared memory communication. At the physical level, the length of the wires will be kept to a minimum. CRYPTARRAY is organized as a chessboard in which the dark and light squares represent Processing Elements (PE) and memory blocks respectively. The granularity and resource composition of the PEs is specifically designed to support the computing operations encountered in cryptographic algorithms in general, and symmetric algorithms in particular. Communication can occur only between neighboring PEs through locally shared memory blocks. Because of the chessboard layout, the architecture can be reconfigured to allow computation to proceed as a pipelined wave in any direction. This organization offers a high computational density in terms of datapath resources and a large number of distributed storage resources that easily support a high degree of parallelism and pipelining. Experimental prototyping a small array on FPGA chips shows that this architecture can run at 80.9 MHz producing 26,968,716 outputs every second in static reconfiguration mode and 20,226,537 outputs every second in dynamic reconfiguration mode

    Design and analysis of an FPGA-based, multi-processor HW-SW system for SCC applications

    Get PDF
    The last 30 years have seen an increase in the complexity of embedded systems from a collection of simple circuits to systems consisting of multiple processors managing a wide variety of devices. This ever increasing complexity frequently requires that high assurance, fail-safe and secure design techniques be applied to protect against possible failures and breaches. To facilitate the implementation of these embedded systems in an efficient way, the FPGA industry recently created new families of devices. New features added to these devices include anti-tamper monitoring, bit stream encryption, and optimized routing architectures for physical and functional logic partition isolation. These devices have high capacities and are capable of implementing processors using their reprogrammable logic structures. This allows for an unprecedented level of hardware and software interaction within a single FPGA chip. High assurance and fail-safe systems can now be implemented within the reconfigurable hardware fabric of an FPGA, enabling these systems to maintain flexibility and achieve high performance while providing a high level of data security. The objective of this thesis was to design and analyze an FPGA-based system containing two isolated, softcore Nios processors that share data through two crypto-engines. FPGA-based single-chip cryptographic (SCC) techniques were employed to ensure proper component isolation when the design is placed on a device supporting the appropriate security primitives. Each crypto-engine is an implementation of the Advanced Encryption Standard (AES), operating in Galois/Counter Mode (GCM) for both encryption and authentication. The features of the microprocessors and architectures of the AES crypto-engines were varied with the goal of determining combinations which best target high performance, minimal hardware usage, or a combination of the two

    VLSI implementation of AES algorithm

    Get PDF
    In the present era of information processing through computers and access of private information over the internet like bank account information even the transaction of money, business deal through video conferencing, encryption of the messages in various forms has become inevitable. There are mainly two types of encryption algorithms, private key (also called symmetric key having single key for encryption and decryption) and public key (separate key for encryption and decryption). In the present work, hardware optimization for AES architecture has been done in different stages. The hardware comparison results show that as AES architecture has critical path delay of 9.78 ns when conventional s-box is used, whereas it has critical path delay of 8.17 ns using proposed s-box architecture. The total clock cycles required to encrypt 128 bits of data using proposed AES architecture are 86 and therefore, throughput of the AES design in Spartan-6 of Xilinx FPGA is approximately 182.2 Mbits/s. To achieve the very high speed, full custom design of s-box in composite field has been done for the proposed s-box architecture in Cadence Virtuoso. The novel XOR gate is proposed for use in s-box design which is efficient in terms of delay and power along with high noise margin. The implementation has been done in 180 nm UMC technology. Total dynamic power in the proposed XOR gate is 0.63 µW as compared to 5.27 µW in the existing design of XOR. The designed s-box using proposed XOR occupies a total area of 27348 µm2. The s-box chip consumes 22.6 µW dynamic power and has 8.2 ns delay after post layout simulation has been performed

    Implementing IPsec using the Five-layer security framework and FPGAs.

    Get PDF

    The theoretical development of a new high speed solution for Monte Carlo radiation transport computations

    Get PDF
    Advancements in parallel and cluster computing have made many complex Monte Carlo simulations possible in the past several years. Unfortunately, cluster computers are large, expensive, and still not fast enough to make the Monte Carlo technique useful for calculations requiring a near real-time evaluation period. For Monte Carlo simulations, a small computational unit called a Field Programmable Gate Array (FPGA) is capable of bringing the power of a large cluster computer into any personal computer (PC). Because an FPGA is capable of executing Monte Carlo simulations with a high degree of parallelism, a simulation run on a large FPGA can be executed at a much higher rate than an equivalent simulation on a modern single-processor desktop PC. In this thesis, a simple radiation transport problem involving moderate energy photons incident on a three-dimensional target is discussed. By comparing the theoretical evaluation speed of this transport problem on a large FPGA to the evaluation speed of the same transport problem using standard computing techniques, it is shown that it is possible to accelerate Monte Carlo computations significantly using FPGAs. In fact, we have found that our simple photon transport test case can be evaluated in excess of 650 times faster on a large FPGA than on a 3.2 GHz Pentium-4 desktop PC running MCNP5âÂÂan acceleration factor that we predict will be largely preserved for most Monte Carlo simulations

    Analysis and Implementation of an iterative architecture with 3 stages pipeline and 32 bits datapath to an AES-128 co-processor

    Get PDF
    Orientador: Luís Geraldo Pedroso MeloniDissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: Neste trabalho, propõe-se uma arquitetura de hardware para um co-processador capaz de realizar encriptação e decriptação segundo o padrão AES-128 com suporte aos modos de operação ECB, CBC e CTR. A arquitetura proposta emprega as técnica de loop rolling com compartilhamento de recursos (para reduzir a quantidade de lógica necessária) e sub-pipeling (para aumentar a frequência de operação do circuito). A largura do datapath é 32 bits e o número de estágios do pipeline é 3. Também documenta-se os resultados do projeto OpenAES. O OpenAES é um projeto open source desenvolvido a partir deste trabalho e que disponibiliza um IP Core de um co-processador AES compatível com o protocolo AMBA APB. O IP Core do projeto OpenAES faz uso da arquitetura proposta na primeira parte deste trabalho, adicionando a ela diversas funcionalidades, como suporte a DMA, geração de interrupções e possibilidade de suspensão de mensagens. Como resultados do projeto, são disponibilizados: o RTL, em Verilog, do IP Core, um ambiente de verificação funcional, uma camada de abstração de hardware (HAL), escrita em C, compatível com o padrão ARM CMSIS e um script de timing constraints no formato SDC. Como forma de validação, o IP foi prototipado em um dispositivo SmartFusion A2F200M3FAbstract: This work proposes an AES-128 hardware architecture that supports both encryption and decryption for the ECB, CBC and CTR modes. The datapath width is 32 bits and the number of pipeline stages is 3. This work also documents the OpenAES project. The OpenAES is an open source project that provides an IP-Core for an AES co-processor that is compatible with the AMBA APB protocol and is based on the architecture described in the first part of this work. Several features such as DMA capabilites, interruptions generations and suport to message priorization are added to the basic architecture. The project provides: the synthesizable RTL Verilog for the IP Core, a function verification enviroment, a hardware abstraction layer compatible with the CMSIS standard and a SDC timing constraints file. The IP validation was peformed through a SmartFusion A2F200M3F deviceMestradoTelecomunicações e TelemáticaMestre em Engenharia ElétricaCAPE
    corecore