37,099 research outputs found

    Arithmetic Operations in Multi-Valued Logic

    Full text link
    This paper presents arithmetic operations like addition, subtraction and multiplications in Modulo-4 arithmetic, and also addition, multiplication in Galois field, using multi-valued logic (MVL). Quaternary to binary and binary to quaternary converters are designed using down literal circuits. Negation in modular arithmetic is designed with only one gate. Logic design of each operation is achieved by reducing the terms using Karnaugh diagrams, keeping minimum number of gates and depth of net in to consideration. Quaternary multiplier circuit is proposed to achieve required optimization. Simulation result of each operation is shown separately using Hspice.Comment: 12 Pages, VLSICS Journal 201

    Efficient Implementation on Low-Cost SoC-FPGAs of TLSv1.2 Protocol with ECC_AES Support for Secure IoT Coordinators

    Get PDF
    Security management for IoT applications is a critical research field, especially when taking into account the performance variation over the very different IoT devices. In this paper, we present high-performance client/server coordinators on low-cost SoC-FPGA devices for secure IoT data collection. Security is ensured by using the Transport Layer Security (TLS) protocol based on the TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256 cipher suite. The hardware architecture of the proposed coordinators is based on SW/HW co-design, implementing within the hardware accelerator core Elliptic Curve Scalar Multiplication (ECSM), which is the core operation of Elliptic Curve Cryptosystems (ECC). Meanwhile, the control of the overall TLS scheme is performed in software by an ARM Cortex-A9 microprocessor. In fact, the implementation of the ECC accelerator core around an ARM microprocessor allows not only the improvement of ECSM execution but also the performance enhancement of the overall cryptosystem. The integration of the ARM processor enables to exploit the possibility of embedded Linux features for high system flexibility. As a result, the proposed ECC accelerator requires limited area, with only 3395 LUTs on the Zynq device used to perform high-speed, 233-bit ECSMs in 413 µs, with a 50 MHz clock. Moreover, the generation of a 384-bit TLS handshake secret key between client and server coordinators requires 67.5 ms on a low cost Zynq 7Z007S device

    Faster 64-bit universal hashing using carry-less multiplications

    Get PDF
    Intel and AMD support the Carry-less Multiplication (CLMUL) instruction set in their x64 processors. We use CLMUL to implement an almost universal 64-bit hash family (CLHASH). We compare this new family with what might be the fastest almost universal family on x64 processors (VHASH). We find that CLHASH is at least 60% faster. We also compare CLHASH with a popular hash function designed for speed (Google's CityHash). We find that CLHASH is 40% faster than CityHash on inputs larger than 64 bytes and just as fast otherwise

    On fast multiplication of a matrix by its transpose

    Get PDF
    We present a non-commutative algorithm for the multiplication of a 2x2-block-matrix by its transpose using 5 block products (3 recursive calls and 2 general products) over C or any finite field.We use geometric considerations on the space of bilinear forms describing 2x2 matrix products to obtain this algorithm and we show how to reduce the number of involved additions.The resulting algorithm for arbitrary dimensions is a reduction of multiplication of a matrix by its transpose to general matrix product, improving by a constant factor previously known reductions.Finally we propose schedules with low memory footprint that support a fast and memory efficient practical implementation over a finite field.To conclude, we show how to use our result in LDLT factorization.Comment: ISSAC 2020, Jul 2020, Kalamata, Greec

    Multiplication theory for dynamically biased avalanche photodiodes: new limits for gain bandwidth product

    Get PDF
    Novel theory is developed for the avalanche multiplication process in avalanche photodiodes (APDs) under time-varying reverse-biasing conditions. Integral equations are derived characterizing the statistics of the multiplication factor and the impulse-response function of APDs, as well as their breakdown probability, all under the assumption that the electric field driving the avalanche process is time varying and spatially nonuniform. Numerical calculations generated by the model predict that by using a bit-synchronous sinusoidal biasing scheme to operate the APD in an optical receiver, the pulse-integrated gain-bandwidth product can be improved by a factor of 5 compared to the same APD operating under the conventional static biasing. The bit-synchronized periodic modulation of the electric field in the multiplication region serves to (1) produce large avalanche multiplication factors with suppressed avalanche durations for photons arriving in the early phase of each optical pulse; and (2) generate low avalanche gains and very short avalanche durations for photons arriving in the latter part of each optical pulse. These two factors can work together to reduce intersymbol interference in optical receivers without sacrificing sensitivity

    Efficient dot product over word-size finite fields

    Full text link
    We want to achieve efficiency for the exact computation of the dot product of two vectors over word-size finite fields. We therefore compare the practical behaviors of a wide range of implementation techniques using different representations. The techniques used include oating point representations, discrete logarithms, tabulations, Montgomery reduction, delayed modulus

    Strongly universal string hashing is fast

    Get PDF
    We present fast strongly universal string hashing families: they can process data at a rate of 0.2 CPU cycle per byte. Maybe surprisingly, we find that these families---though they require a large buffer of random numbers---are often faster than popular hash functions with weaker theoretical guarantees. Moreover, conventional wisdom is that hash functions with fewer multiplications are faster. Yet we find that they may fail to be faster due to operation pipelining. We present experimental results on several processors including low-powered processors. Our tests include hash functions designed for processors with the Carry-Less Multiplication (CLMUL) instruction set. We also prove, using accessible proofs, the strong universality of our families.Comment: Software is available at http://code.google.com/p/variablelengthstringhashing/ and https://github.com/lemire/StronglyUniversalStringHashin

    Decoding Generalized Reed-Solomon Codes and Its Application to RLCE Encryption Schemes

    Get PDF
    This paper compares the efficiency of various algorithms for implementing quantum resistant public key encryption scheme RLCE on 64-bit CPUs. By optimizing various algorithms for polynomial and matrix operations over finite fields, we obtained several interesting (or even surprising) results. For example, it is well known (e.g., Moenck 1976 \cite{moenck1976practical}) that Karatsuba's algorithm outperforms classical polynomial multiplication algorithm from the degree 15 and above (practically, Karatsuba's algorithm only outperforms classical polynomial multiplication algorithm from the degree 35 and above ). Our experiments show that 64-bit optimized Karatsuba's algorithm will only outperform 64-bit optimized classical polynomial multiplication algorithm for polynomials of degree 115 and above over finite field GF(210)GF(2^{10}). The second interesting (surprising) result shows that 64-bit optimized Chien's search algorithm ourperforms all other 64-bit optimized polynomial root finding algorithms such as BTA and FFT for polynomials of all degrees over finite field GF(210)GF(2^{10}). The third interesting (surprising) result shows that 64-bit optimized Strassen matrix multiplication algorithm only outperforms 64-bit optimized classical matrix multiplication algorithm for matrices of dimension 750 and above over finite field GF(210)GF(2^{10}). It should be noted that existing literatures and practices recommend Strassen matrix multiplication algorithm for matrices of dimension 40 and above. All our experiments are done on a 64-bit MacBook Pro with i7 CPU and single thread C codes. It should be noted that the reported results should be appliable to 64 or larger bits CPU architectures. For 32 or smaller bits CPUs, these results may not be applicable. The source code and library for the algorithms covered in this paper are available at http://quantumca.org/
    corecore