698 research outputs found
A versatile Montgomery multiplier architecture with characteristic three support
We present a novel unified core design which is extended to realize Montgomery multiplication in the fields GF(2n), GF(3m), and GF(p). Our unified design supports RSA and elliptic curve schemes, as well as the identity-based encryption which requires a pairing computation on an elliptic curve. The architecture is pipelined and is highly scalable. The unified core utilizes the redundant signed digit representation to reduce the critical path delay. While the carry-save representation used in classical unified architectures is only good for addition and multiplication operations, the redundant signed digit representation also facilitates efficient computation of comparison and subtraction operations besides addition and multiplication. Thus, there is no need for a transformation between the redundant and the non-redundant representations of field elements, which would be required in the classical unified architectures to realize the subtraction and comparison operations. We also quantify the benefits of the unified architectures in terms of area and critical path delay. We provide detailed implementation results. The metric shows that the new unified architecture provides an improvement over a hypothetical non-unified architecture of at least 24.88%, while the improvement over a classical unified architecture is at least 32.07%
Low-power Programmable Processor for Fast Fourier Transform Based on Transport Triggered Architecture
This paper describes a low-power processor tailored for fast Fourier
transform computations where transport triggering template is exploited. The
processor is software-programmable while retaining an energy-efficiency
comparable to existing fixed-function implementations. The power savings are
achieved by compressing the computation kernel into one instruction word. The
word is stored in an instruction loop buffer, which is more power-efficient
than regular instruction memory storage. The processor supports all
power-of-two FFT sizes from 64 to 16384 and given 1 mJ of energy, it can
compute 20916 transforms of size 1024.Comment: 5 pages, 4 figures, 1 table, ICASSP 2019 conferenc
Arithmetic Operations in Multi-Valued Logic
This paper presents arithmetic operations like addition, subtraction and
multiplications in Modulo-4 arithmetic, and also addition, multiplication in
Galois field, using multi-valued logic (MVL). Quaternary to binary and binary
to quaternary converters are designed using down literal circuits. Negation in
modular arithmetic is designed with only one gate. Logic design of each
operation is achieved by reducing the terms using Karnaugh diagrams, keeping
minimum number of gates and depth of net in to consideration. Quaternary
multiplier circuit is proposed to achieve required optimization. Simulation
result of each operation is shown separately using Hspice.Comment: 12 Pages, VLSICS Journal 201
Modeling Algorithms in SystemC and ACL2
We describe the formal language MASC, based on a subset of SystemC and
intended for modeling algorithms to be implemented in hardware. By means of a
special-purpose parser, an algorithm coded in SystemC is converted to a MASC
model for the purpose of documentation, which in turn is translated to ACL2 for
formal verification. The parser also generates a SystemC variant that is
suitable as input to a high-level synthesis tool. As an illustration of this
methodology, we describe a proof of correctness of a simple 32-bit radix-4
multiplier.Comment: In Proceedings ACL2 2014, arXiv:1406.123
Low-power, high-speed FFT processor for MB-OFDM UWB application
This paper presents a low-power, high-speed 4-data-path 128-point mixed-radix (radix-2 & radix-2 2 ) FFT processor for MB-OFDM Ultra-WideBand (UWB) systems. The processor employs the single-path delay feedback (SDF) pipelined structure for the proposed algorithm, it uses substructure-sharing multiplication units and shift-add structure other than traditional complex multipliers. Furthermore, the word lengths are properly chosen, thus the hardware costs and power consumption of the proposed FFT processor are efficiently reduced. The proposed FFT processor is verified and synthesized by using 0.13 µm CMOS technology with a supply voltage of 1.32 V. The implementation results indicate that the proposed 128-point mixed-radix FFT architecture supports a throughput rate of 1Gsample/s with lower power consumption in comparison to existing 128-point FFT architecture
- …