37,099 research outputs found
Arithmetic Operations in Multi-Valued Logic
This paper presents arithmetic operations like addition, subtraction and
multiplications in Modulo-4 arithmetic, and also addition, multiplication in
Galois field, using multi-valued logic (MVL). Quaternary to binary and binary
to quaternary converters are designed using down literal circuits. Negation in
modular arithmetic is designed with only one gate. Logic design of each
operation is achieved by reducing the terms using Karnaugh diagrams, keeping
minimum number of gates and depth of net in to consideration. Quaternary
multiplier circuit is proposed to achieve required optimization. Simulation
result of each operation is shown separately using Hspice.Comment: 12 Pages, VLSICS Journal 201
Efficient Implementation on Low-Cost SoC-FPGAs of TLSv1.2 Protocol with ECC_AES Support for Secure IoT Coordinators
Security management for IoT applications is a critical research field, especially when taking into account the performance variation over the very different IoT devices. In this paper, we present high-performance client/server coordinators on low-cost SoC-FPGA devices for secure IoT data collection. Security is ensured by using the Transport Layer Security (TLS) protocol based on the TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256 cipher suite. The hardware architecture of the proposed coordinators is based on SW/HW co-design, implementing within the hardware accelerator core Elliptic Curve Scalar Multiplication (ECSM), which is the core operation of Elliptic Curve Cryptosystems (ECC). Meanwhile, the control of the overall TLS scheme is performed in software by an ARM Cortex-A9 microprocessor. In fact, the implementation of the ECC accelerator core around an ARM microprocessor allows not only the improvement of ECSM execution but also the performance enhancement of the overall cryptosystem. The integration of the ARM processor enables to exploit the possibility of embedded Linux features for high system flexibility. As a result, the proposed ECC accelerator requires limited area, with only 3395 LUTs on the Zynq device used to perform high-speed, 233-bit ECSMs in 413 µs, with a 50 MHz clock. Moreover, the generation of a 384-bit TLS handshake secret key between client and server coordinators requires 67.5 ms on a low cost Zynq 7Z007S device
Faster 64-bit universal hashing using carry-less multiplications
Intel and AMD support the Carry-less Multiplication (CLMUL) instruction set
in their x64 processors. We use CLMUL to implement an almost universal 64-bit
hash family (CLHASH). We compare this new family with what might be the fastest
almost universal family on x64 processors (VHASH). We find that CLHASH is at
least 60% faster. We also compare CLHASH with a popular hash function designed
for speed (Google's CityHash). We find that CLHASH is 40% faster than CityHash
on inputs larger than 64 bytes and just as fast otherwise
On fast multiplication of a matrix by its transpose
We present a non-commutative algorithm for the multiplication of a
2x2-block-matrix by its transpose using 5 block products (3 recursive calls and
2 general products) over C or any finite field.We use geometric considerations
on the space of bilinear forms describing 2x2 matrix products to obtain this
algorithm and we show how to reduce the number of involved additions.The
resulting algorithm for arbitrary dimensions is a reduction of multiplication
of a matrix by its transpose to general matrix product, improving by a constant
factor previously known reductions.Finally we propose schedules with low memory
footprint that support a fast and memory efficient practical implementation
over a finite field.To conclude, we show how to use our result in LDLT
factorization.Comment: ISSAC 2020, Jul 2020, Kalamata, Greec
Multiplication theory for dynamically biased avalanche photodiodes: new limits for gain bandwidth product
Novel theory is developed for the avalanche multiplication process in avalanche photodiodes (APDs) under time-varying reverse-biasing conditions. Integral equations are derived characterizing the statistics of the multiplication factor and the impulse-response function of APDs, as well as their breakdown probability, all under the assumption that the electric field driving the avalanche process is time varying and spatially nonuniform. Numerical calculations generated by the model predict that by using a bit-synchronous sinusoidal biasing scheme to operate the APD in an optical receiver, the pulse-integrated gain-bandwidth product can be improved by a factor of 5 compared to the same APD operating under the conventional static biasing. The bit-synchronized periodic modulation of the electric field in the multiplication region serves to (1) produce large avalanche multiplication factors with suppressed avalanche durations for photons arriving in the early phase of each optical pulse; and (2) generate low avalanche gains and very short avalanche durations for photons arriving in the latter part of each optical pulse. These two factors can work together to reduce intersymbol interference in optical receivers without sacrificing sensitivity
Efficient dot product over word-size finite fields
We want to achieve efficiency for the exact computation of the dot product of
two vectors over word-size finite fields. We therefore compare the practical
behaviors of a wide range of implementation techniques using different
representations. The techniques used include oating point representations,
discrete logarithms, tabulations, Montgomery reduction, delayed modulus
Strongly universal string hashing is fast
We present fast strongly universal string hashing families: they can process
data at a rate of 0.2 CPU cycle per byte. Maybe surprisingly, we find that
these families---though they require a large buffer of random numbers---are
often faster than popular hash functions with weaker theoretical guarantees.
Moreover, conventional wisdom is that hash functions with fewer multiplications
are faster. Yet we find that they may fail to be faster due to operation
pipelining. We present experimental results on several processors including
low-powered processors. Our tests include hash functions designed for
processors with the Carry-Less Multiplication (CLMUL) instruction set. We also
prove, using accessible proofs, the strong universality of our families.Comment: Software is available at
http://code.google.com/p/variablelengthstringhashing/ and
https://github.com/lemire/StronglyUniversalStringHashin
Decoding Generalized Reed-Solomon Codes and Its Application to RLCE Encryption Schemes
This paper compares the efficiency of various algorithms for implementing
quantum resistant public key encryption scheme RLCE on 64-bit CPUs. By
optimizing various algorithms for polynomial and matrix operations over finite
fields, we obtained several interesting (or even surprising) results. For
example, it is well known (e.g., Moenck 1976 \cite{moenck1976practical}) that
Karatsuba's algorithm outperforms classical polynomial multiplication algorithm
from the degree 15 and above (practically, Karatsuba's algorithm only
outperforms classical polynomial multiplication algorithm from the degree 35
and above ). Our experiments show that 64-bit optimized Karatsuba's algorithm
will only outperform 64-bit optimized classical polynomial multiplication
algorithm for polynomials of degree 115 and above over finite field
. The second interesting (surprising) result shows that 64-bit
optimized Chien's search algorithm ourperforms all other 64-bit optimized
polynomial root finding algorithms such as BTA and FFT for polynomials of all
degrees over finite field . The third interesting (surprising)
result shows that 64-bit optimized Strassen matrix multiplication algorithm
only outperforms 64-bit optimized classical matrix multiplication algorithm for
matrices of dimension 750 and above over finite field . It should
be noted that existing literatures and practices recommend Strassen matrix
multiplication algorithm for matrices of dimension 40 and above. All our
experiments are done on a 64-bit MacBook Pro with i7 CPU and single thread C
codes. It should be noted that the reported results should be appliable to 64
or larger bits CPU architectures. For 32 or smaller bits CPUs, these results
may not be applicable. The source code and library for the algorithms covered
in this paper are available at http://quantumca.org/
- …