Search CORE

1,985 research outputs found

Synthesis and Optimization of Reversible Circuits - A Survey

Author: Arabzadeh M.
Cheung D.
Cheung D.
Cuccaro S. A.
De Vos A.
Doucçot B.
Fazel K.
Glück R.
Hirata Y.
Igor L. Markov
Korf R.
Kutin S.
Kutin S. A.
Lee S.
Markov I. L.
Markov I. L.
Mehdi Saeedi
Miller D.
Mishchenko A.
Patel K. N.
Politi A.
Saeedi M.
Saeedi M.
Saeedi M.
Shende V. V.
Shi Z.
Soeken M.
Storme L.
Takahashi Y.
Takahashi Y.
Viamontes G. F.
Wille R.
Yamashita S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/03/2013
Field of study

Reversible logic circuits have been historically motivated by theoretical research in low-power electronics as well as practical improvement of bit-manipulation transforms in cryptography and computer graphics. Recently, reversible circuits have attracted interest as components of quantum algorithms, as well as in photonic and nano-computing technologies where some switching devices offer no signal gain. Research in generating reversible logic distinguishes between circuit synthesis, post-synthesis optimization, and technology mapping. In this survey, we review algorithmic paradigms --- search-based, cycle-based, transformation-based, and BDD-based --- as well as specific algorithms for reversible synthesis, both exact and heuristic. We conclude the survey by outlining key open challenges in synthesis of reversible and quantum logic, as well as most common misconceptions.Comment: 34 pages, 15 figures, 2 table

arXiv.org e-Print Archive

Crossref

Similar operation template attack on RSA-CRT as a case study

Author: Dawu Gu
Haihua Gu
Junrong Liu
Kaiyu Zhang
Lei Wang
Sen Xu
Weijia Wang
Xiangjun Lu
Yang Li
Zheng Guo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2018
Field of study

A template attack, the most powerful side-channel attack methods, usually first builds the leakage profiles from a controlled profiling device, and then uses these profiles to recover the secret of the target device. It is based on the fact that the profiling device shares similar leakage characteristics with the target device. In this study, we focus on the similar operations in a single device and propose a new variant of the template attack, called the similar operation template attack (SOTA). SOTA builds the models on public variables (e.g., input/output) and recovers the values of the secret variables that leak similar to the public variables. SOTA’s advantage is that it can avoid the requirement of an additional profiling device. In this study, the proposed SOTA method is applied to a straightforward RSA-CRT implementation. Because the leakage is (almost) the same in similar operations, we reduce the security of RSA-CRT to a hidden multiplier problem (HMP) over GF(q), which can be solved byte-wise using our proposed heuristic algorithm. The effectiveness of our proposed method is verified as an entire prime recovery procedure in a practical leakage scenario

Creative Repository of Electro-Communications

Faster 64-bit universal hashing using carry-less multiplications

Author: Kaser Owen
Lemire Daniel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/11/2015
Field of study

Intel and AMD support the Carry-less Multiplication (CLMUL) instruction set in their x64 processors. We use CLMUL to implement an almost universal 64-bit hash family (CLHASH). We compare this new family with what might be the fastest almost universal family on x64 processors (VHASH). We find that CLHASH is at least 60% faster. We also compare CLHASH with a popular hash function designed for speed (Google's CityHash). We find that CLHASH is 40% faster than CityHash on inputs larger than 64 bytes and just as fast otherwise

arXiv.org e-Print Archive

R-libre

Strongly universal string hashing is fast

Author: Kaser Owen
Lemire Daniel
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/11/2014
Field of study

We present fast strongly universal string hashing families: they can process data at a rate of 0.2 CPU cycle per byte. Maybe surprisingly, we find that these families---though they require a large buffer of random numbers---are often faster than popular hash functions with weaker theoretical guarantees. Moreover, conventional wisdom is that hash functions with fewer multiplications are faster. Yet we find that they may fail to be faster due to operation pipelining. We present experimental results on several processors including low-powered processors. Our tests include hash functions designed for processors with the Carry-Less Multiplication (CLMUL) instruction set. We also prove, using accessible proofs, the strong universality of our families.Comment: Software is available at http://code.google.com/p/variablelengthstringhashing/ and https://github.com/lemire/StronglyUniversalStringHashin

arXiv.org e-Print Archive

R-libre

Crossref

Versatile Montgomery Multiplier Architectures

Author: Gaubatz Gunnar
Publication venue: Digital WPI
Publication date: 30/04/2002
Field of study

Several algorithms for Public Key Cryptography (PKC), such as RSA, Diffie-Hellman, and Elliptic Curve Cryptography, require modular multiplication of very large operands (sizes from 160 to 4096 bits) as their core arithmetic operation. To perform this operation reasonably fast, general purpose processors are not always the best choice. This is why specialized hardware, in the form of cryptographic co-processors, become more attractive. Based upon the analysis of recent publications on hardware design for modular multiplication, this M.S. thesis presents a new architecture that is scalable with respect to word size and pipelining depth. To our knowledge, this is the first time a word based algorithm for Montgomery\u27s method is realized using high-radix bit-parallel multipliers working with two different types of finite fields (unified architecture for GF(p) and GF(2n)). Previous approaches have relied mostly on bit serial multiplication in combination with massive pipelining, or Radix-8 multiplication with the limitation to a single type of finite field. Our approach is centered around the notion that the optimal delay in bit-parallel multipliers grows with logarithmic complexity with respect to the operand size n, O(log3/2 n), while the delay of bit serial implementations grows with linear complexity O(n). Our design has been implemented in VHDL, simulated and synthesized in 0.5μ CMOS technology. The synthesized net list has been verified in back-annotated timing simulations and analyzed in terms of performance and area consumption

DigitalCommons@WPI