266 research outputs found

    Speed and Area Optimized Parallel Higher-Radix Modular Multipliers

    Get PDF
    Modular multiplication is the fundamental and compute-intense operation in many Public-Key crypto-systems. This paper presents two modular multipliers with their efficient architectures based on Booth encoding, higher-radix, and Montgomery powering ladder approaches. Montgomery powering ladder technique enables concurrent execution of main operations in the proposed designs, while higher-radix techniques have been adopted to reduce an iteration count which formally dictates a cycle count. It is also shown that by an adopting Booth encoding logic in the designs helps to reduce their area cost with a slight degradation in the maximum achievable frequencies. The proposed designs are implemented in Verilog HDL and synthesized targeting virtex-6 FPGA platform using Xilinx ISE 14.2 Design suite. The radix-4 multiplier computes a 256-bit modular multiplication in 0.93 ms, occupies 1.6K slices, at 137.87 MHz in a cycle count of n/2+2, whereas the radix-8 multiplier completes the operation in 0.69ms, occupies 3.6K slices, achieves 123.43 MHz frequency in a cycle count of n/3+4. The implementation results reveals that the proposed designs consumes 18% lower FPGA slices without any significant performance degradation as compared to their best contemporary designs

    FPGA Based High Speed SPA Resistant Elliptic Curve Scalar Multiplier Architecture

    Get PDF
    The higher computational complexity of an elliptic curve scalar point multiplication operation limits its implementation on general purpose processors. Dedicated hardware architectures are essential to reduce the computational time, which results in a substantial increase in the performance of associated cryptographic protocols. This paper presents a unified architecture to compute modular addition, subtraction, and multiplication operations over a finite field of large prime characteristic GF(p). Subsequently, dual instances of the unified architecture are utilized in the design of high speed elliptic curve scalar multiplier architecture. The proposed architecture is synthesized and implemented on several different Xilinx FPGA platforms for different field sizes. The proposed design computes a 192-bit elliptic curve scalar multiplication in 2.3 ms on Virtex-4 FPGA platform. It is 34% faster and requires 40% fewer clock cycles for elliptic curve scalar multiplication and consumes considerable fewer FPGA slices as compared to the other existing designs. The proposed design is also resistant to the timing and simple power analysis (SPA) attacks; therefore it is a good choice in the construction of fast and secure elliptic curve based cryptographic protocols

    A Scalable Correlator Architecture Based on Modular FPGA Hardware, Reuseable Gateware, and Data Packetization

    Full text link
    A new generation of radio telescopes is achieving unprecedented levels of sensitivity and resolution, as well as increased agility and field-of-view, by employing high-performance digital signal processing hardware to phase and correlate large numbers of antennas. The computational demands of these imaging systems scale in proportion to BMN^2, where B is the signal bandwidth, M is the number of independent beams, and N is the number of antennas. The specifications of many new arrays lead to demands in excess of tens of PetaOps per second. To meet this challenge, we have developed a general purpose correlator architecture using standard 10-Gbit Ethernet switches to pass data between flexible hardware modules containing Field Programmable Gate Array (FPGA) chips. These chips are programmed using open-source signal processing libraries we have developed to be flexible, scalable, and chip-independent. This work reduces the time and cost of implementing a wide range of signal processing systems, with correlators foremost among them,and facilitates upgrading to new generations of processing technology. We present several correlator deployments, including a 16-antenna, 200-MHz bandwidth, 4-bit, full Stokes parameter application deployed on the Precision Array for Probing the Epoch of Reionization.Comment: Accepted to Publications of the Astronomy Society of the Pacific. 31 pages. v2: corrected typo, v3: corrected Fig. 1

    Reliable Hardware Architectures for Cyrtographic Block Ciphers LED and HIGHT

    Get PDF
    Cryptographic architectures provide different security properties to sensitive usage models. However, unless reliability of architectures is guaranteed, such security properties can be undermined through natural or malicious faults. In this thesis, two underlying block ciphers which can be used in authenticated encryption algorithms are considered, i.e., LED and HIGHT block ciphers. The former is of the Advanced Encryption Standard (AES) type and has been considered areaefficient, while the latter constitutes a Feistel network structure and is suitable for low-complexity and low-power embedded security applications. In this thesis, we propose efficient error detection architectures including variants of recomputing with encoded operands and signature-based schemes to detect both transient and permanent faults. Authenticated encryption is applied in cryptography to provide confidentiality, integrity, and authenticity simultaneously to the message sent in a communication channel. In this thesis, we show that the proposed schemes are applicable to the case study of Simple Lightweight CFB (SILC) for providing authenticated encryption with associated data (AEAD). The error simulations are performed using Xilinx ISE tool and the results are benchmarked for the Xilinx FPGA family Virtex- 7 to assess the reliability capability and efficiency of the proposed architectures

    A scalable packetised radio astronomy imager

    Get PDF
    Includes bibliographical referencesModern radio astronomy telescopes the world over require digital back-ends. The complexity of these systems depends on many site-specific factors, including the number of antennas, beams and frequency channels and the bandwidth to be processed. With the increasing popularity for ever larger interferometric arrays, the processing requirements for these back-ends have increased significantly. While the techniques for building these back-ends are well understood, every installation typically still takes many years to develop as the instruments use highly specialised, custom hardware in order to cope with the demanding engineering requirements. Modern technology has enabled reprogrammable FPGA-based processing boards, together with packet-based switching techniques, to perform all the digital signal processing requirements of a modern radio telescope array. The various instruments used by radio telescopes are functionally very different, but the component operations remain remarkably similar and many share core functionalities. Generic processing platforms are thus able to share signal processing libraries and can acquire different personalities to perform different functions simply by reprogramming them and rerouting the data appropriately. Furthermore, Ethernet-based packet-switched networks are highly flexible and scalable, enabling the same instrument design to be scaled to larger installations simply by adding additional processing nodes and larger network switches. The ability of a packetised network to transfer data to arbitrary processing nodes, along with these nodes' reconfigurability, allows for unrestrained partitioning of designs and resource allocation. This thesis describes the design and construction of the first working radio astronomy imaging instrument hosted on Ethernet-interconnected re- programmable FPGA hardware. I attempt to establish an optimal packetised architecture for the most popular instruments with particular attention to the core array functions of correlation and beamforming. Emphasis is placed on requirements for South Africa's MeerKAT array. A demonstration system is constructed and deployed on the KAT-7 array, MeerKAT's prototype. This research promises reduced instrument development time, lower costs, improved reliability and closer collaboration between telescope design teams

    Fast Modular Reduction for Large-Integer Multiplication

    Get PDF
    The work contained in this thesis is a representation of the successful attempt to speed-up the modular reduction as an independent step of modular multiplication, which is the central operation in public-key cryptosystems. Based on the properties of Mersenne and Quasi-Mersenne primes, four distinct sets of moduli have been described, which are responsible for converting the single-precision multiplication prevalent in many of today\u27s techniques into an addition operation and a few simple shift operations. A novel algorithm has been proposed for modular folding. With the backing of the special moduli sets, the proposed algorithm is shown to outperform (speed-wise) the Modified Barrett algorithm by 80% for operands of length 700 bits, the least speed-up being around 70% for smaller operands, in the range of around 100 bits

    Lightweight Architectures for Reliable and Fault Detection Simon and Speck Cryptographic Algorithms on FPGA

    Get PDF
    The widespread use of sensitive and constrained applications necessitates lightweight (lowpower and low-area) algorithms developed for constrained nano-devices. However, nearly all of such algorithms are optimized for platform-based performance and may not be useful for diverse and flexible applications. The National Security Agency (NSA) has proposed two relatively-recent families of lightweight ciphers, i.e., Simon and Speck, designed as efficient ciphers on both hardware and software platforms. This paper proposes concurrent error detection schemes to provide reliable architectures for these two families of lightweight block ciphers. The research work on analyzing the reliability of these algorithms and providing fault diagnosis approaches has not been undertaken to date to the best of our knowledge. The main aim of the proposed reliable architectures is to provide high error coverage while maintaining acceptable area and power consumption overheads. To achieve this, we propose a variant of recomputing with encoded operands. These low-complexity schemes are suited for lowresource applications such as sensitive, constrained implantable and wearable medical devices. We perform fault simulations for the proposed architectures by developing a fault model framework. The architectures are simulated and analyzed on recent field-programmable grate array (FPGA) platforms, and it is shown that the proposed schemes provide high error coverage. The proposed low-complexity concurrent error detection schemes are a step forward towards more reliable architectures for Simon and Speck algorithms in lightweight, secure applications

    A Survey on FPGA-Based Sensor Systems: Towards Intelligent and Reconfigurable Low-Power Sensors for Computer Vision, Control and Signal Processing

    Get PDF
    The current trend in the evolution of sensor systems seeks ways to provide more accuracy and resolution, while at the same time decreasing the size and power consumption. The use of Field Programmable Gate Arrays (FPGAs) provides specific reprogrammable hardware technology that can be properly exploited to obtain a reconfigurable sensor system. This adaptation capability enables the implementation of complex applications using the partial reconfigurability at a very low-power consumption. For highly demanding tasks FPGAs have been favored due to the high efficiency provided by their architectural flexibility (parallelism, on-chip memory, etc.), reconfigurability and superb performance in the development of algorithms. FPGAs have improved the performance of sensor systems and have triggered a clear increase in their use in new fields of application. A new generation of smarter, reconfigurable and lower power consumption sensors is being developed in Spain based on FPGAs. In this paper, a review of these developments is presented, describing as well the FPGA technologies employed by the different research groups and providing an overview of future research within this field.The research leading to these results has received funding from the Spanish Government and European FEDER funds (DPI2012-32390), the Valencia Regional Government (PROMETEO/2013/085) and the University of Alicante (GRE12-17)

    Architectures multi-Asip pour turbo récepteur flexible

    Get PDF
    Rapidly evolving wireless standards use modern techniques such as turbo codes, Bit Interleaved coded Modulation (BICM), high order QAM constellation, Signal Space Diversity (SSD), Multi-Input Multi-Output (MIMO) Spatial Multiplexing (SM) and Space Time Codes (STC) with different parameters for reliable high rate data transmissions. Adoption of such techniques in the transmitter can impact the receiver architecture in three ways: (1) the complex processing related to advanced techniques such as turbo codes, encourage to perform iterative processing in the receiver to improve error rate performance (2) to satisfy high throughput requirement for an iterative receiver, parallel processing is mandatory and finally (3) to allow the support of different techniques and parameters imposed, programmable yet high throughput hardware processing elements are required. In this thesis, to address the high throughput requirement with turbo processing, first of all a study of parallelism on turbo decoding is extended for turbo demodulation and turbo equalization. Based on the results acquired from the parallelism study a flexible high throughput heterogeneous multi-ASIP NoC based unified turbo receiver is proposed. The proposed architecture fulfils the target requirements in a way that: (a) Application Specific Instruction-set Processor (ASIP) exploits metric generation level parallelism and implements the required flexibility, (b) throughputs beyond the capacity of single ASIP in a turbo process are achieved through multiple ASIP elements implementing sub-block parallelism and shuffled processing and finally (c) Network on Chip is used to handle communication conflicts during parallel processing of multiple ASIPs. In pursuit to achieve a hardware model of the proposed architecture two ASIPs are conceived where the first one, namely EquASIP, is dedicated for MMSE-IC equalization and provides a flexible solution for multiple MIMO techniques adopted in multiple wireless standards with a capability to work in turbo equalization context. The second ASIP, named as DemASIP, is a flexible demapper which can be used in MIMO or single antenna environment for any modulation till 256-QAM with or without iterative demodulation. Using available TurbASIP and NoC components, the thesis concludes on an FPGA prototype of heterogeneous multi-ASIP NoC based unified turbo receiver which integrates 9 instances of 3 different ASIPs with 2 NoCs.Les normes de communication sans fil, sans cesse en évolution, imposent l'utilisation de techniques modernes telles que les turbocodes, modulation codée à entrelacement bit (BICM), constellation MAQ d'ordre élevé, diversité de constellation (SSD), multiplexage spatial et codage espace-temps multi-antennes (MIMO) avec des paramètres différents pour des transmissions fiables et de haut débit. L'adoption de ces techniques dans l'émetteur peut influencer l'architecture du récepteur de trois façons: (1) les traitement complexes relatifs aux techniques avancées comme les turbocodes, encourage à effectuer un traitement itératif dans le récepteur pour améliorer la performance en termes de taux d'erreur (2) pour satisfaire l'exigence de haut débit avec un récepteur itératif, le recours au parallélisme est obligatoire et enfin (3) pour assurer le support des différentes techniques et paramètres imposées, des processeurs de traitement matériel flexibles, mais aussi de haute performance, sont nécessaires. Dans cette thèse, pour répondre aux besoins de haut débit dans un contexte de traitement itératif, tout d'abord une étude de parallélisme sur le turbo décodage a été étendue aux applications de turbo démodulation et turbo égalisation. Partant des résultats obtenus à partir de l'étude du parallélisme, un récepteur itératif unifié basé sur un modèle d'architecture multi-ASIP hétérogène intégrant un réseau sur puce (NoC) a été proposé. L'architecture proposée répond aux exigences visées d'une manière où: (a) le concept de processeur à jeu d'instruction dédié à l'application (ASIP) exploite le parallélisme du niveau de génération de métriques et met en oeuvre la flexibilité nécessaire, (b) les débits au-delà de la capacité d'un seul ASIP dans un processus itératif sont obtenus au moyen de multiples ASIP implémentant le parallélisme de sous-blocs et le traitement combiné et enfin (c) le concept de réseau sur puce (NoC) est utilisé pour gérer les conflits de communication au cours du traitement parallèle itératif multi-ASIP. Dans le but de parvenir à un modèle matériel de l'architecture proposée, deux ASIP ont été conçus où le premier, nommé EquASIP, est dédié à l'égalisation MMSE-IC et fournit une solution flexible pour de multiples techniques multi-antennes adoptés dans plusieurs normes sans fil avec la capacité de travailler dans un contexte de turbo égalisation. Le deuxième ASIP, nommé DemASIP, est un démappeur flexible qui peut être utilisé dans un environnement multi-antennes et pour tout type de modulation jusqu'à MAQ-256 avec ou sans démodulation itérative. En intégrant ces ASIP, en plus des NoC et TurbASIP disponibles à Télécom Bretagne, la thèse conclut sur un prototype FPGA d'un récepteur itératif unifié multi-ASIP qui intègre 9 coeurs de 3 différents types d'ASIP avec 2 NoC

    Private and Public-Key Side-Channel Threats Against Hardware Accelerated Cryptosystems

    Get PDF
    Modern side-channel attacks (SCA) have the ability to reveal sensitive data from non-protected hardware implementations of cryptographic accelerators whether they be private or public-key systems. These protocols include but are not limited to symmetric, private-key encryption using AES-128, 192, 256, or public-key cryptosystems using elliptic curve cryptography (ECC). Traditionally, scalar point (SP) operations are compelled to be high-speed at any cost to reduce point multiplication latency. The majority of high-speed architectures of contemporary elliptic curve protocols rely on non-secure SP algorithms. This thesis delivers a novel design, analysis, and successful results from a custom differential power analysis attack on AES-128. The resulting SCA can break any 16-byte master key the sophisticated cipher uses and it\u27s direct applications towards public-key cryptosystems will become clear. Further, the architecture of a SCA resistant scalar point algorithm accompanied by an implementation of an optimized serial multiplier will be constructed. The optimized hardware design of the multiplier is highly modular and can use either NIST approved 233 & 283-bit Kobliz curves utilizing a polynomial basis. The proposed architecture will be implemented on Kintex-7 FPGA to later be integrated with the ARM Cortex-A9 processor on the Zynq-7000 AP SoC (XC7Z045) for seamless data transfer and analysis of the vulnerabilities SCAs can exploit
    • …
    corecore