83 research outputs found

    A Memory Encryption Engine Suitable for General Purpose Processors

    Get PDF
    Cryptographic protection of memory is an essential ingredient for any technology that allows a closed computing system to run software in a trustworthy manner and handle secrets, while its external memory is susceptible to eavesdropping and tampering. An example for such a technology is Intel\u27s emerging Software Guard Extensions technology (Intel SGX) that appears in the latest processor generation, Architecture Codename Skylake. This technology operates under the assumption that the security perimeter includes only the internals of the CPU package, and in particular, leaves the DRAM untrusted. It is supported by an autonomous hardware unit called the Memory Encryption Engine (MEE), whose role is to protect the confidentiality, integrity, and freshness of the CPU-DRAM traffic over some memory range. To succeed in adding this unit to the micro architecture of a general purpose processor product, it must be designed under very strict engineering constraints. This requires a careful combination of cryptographic primitives operating over a customized integrity tree that mostly resides on the DRAM while relying only on a small internally stored root. The purpose of this paper is to explain how this hardware component of SGX works, and the rationale behind some of its design choices. To this end, we formalize the MEE threat model and security objectives, describe the MEE design, cryptographic properties, security margins, and report some concrete performance results

    A j-lanes tree hashing mode and j-lanes SHA-256

    Get PDF
    j-lanes hashing is a tree mode that splits an input message to j slices, computes j independent digests of each slice, and outputs the hash value of their concatenation. We demonstrate the performance advantage of j-lanes hashing on SIMD architectures, by coding a 4-lanes-SHA-256 implementation and measuring its performance on the latest 3rd Generation Intel® Core™. For message ranging 2KB to 132KB in length, the 4-lanes SHA-256 is between 1.5 to 1.97 times faster than the fastest publicly available implementation (that we are aware of), and between 1.9 to 2.5 times faster than OpenSSL 1.0.1c. For long messages, there is no significant performance difference between different choices of j. We show that the 4-lanes SHA-256 is faster than the two SHA3 finalists (BLAKE and Keccak) that have a published tree mode implementation. We explain why j-lanes hashing will be even faster on the future AVX2 architecture with 256 bits registers. This suggests that standardizing a tree mode for hash functions (SHA-256 in particular) would deliver significant performance benefits for a multitude of algorithms and usages

    Efficient Software Implementations of Modular Exponentiation

    Get PDF
    RSA computations have a significant effect on the workloads of SSL/TLS servers, and therefore their software implementations on general purpose processors are an important target for optimization. We concentrate here on 512-bit modular exponentiation, used for 1024-bit RSA. We propose optimizations in two directions. At the primitives’ level, we study and improve the performance of an “Almost” Montgomery Multiplication. At the exponentiation level, we propose a method to reduce the cost of protecting the w-ary exponentiation algorithm against cache/timing side channel attacks. Together, these lead to an efficient software implementation of 512-bit modular exponentiation, which outperforms the currently fastest publicly available alternative. When measured on the latest x86-64 architecture, the 2nd Generation Intel® Core™ processor, our implementation is 43% faster than that of the current version of OpenSSL (1.0.0d)

    Key Committing AEADs

    Get PDF
    This note describes some methods for adding a key commitment property to a generic (nonce-based) AEAD scheme. We analyze the the privacy bounds and key commitment guarantee of the resulting constructions, by expressing them in terms of the properties of the underlying AEAD scheme and the added key commitment primitive. We also offer concrete constructions for a key committing version of AES-GCM

    Parallelized hashing via j-lanes and j-pointers tree modes, with applications to SHA-256

    Get PDF
    The j-lanes tree hashing is a tree mode that splits an input message to j slices, computes j independent digests of each slice, and outputs the hash value of their concatenation. The j-pointers tree hashing is a similar tree mode that receives, as input, j pointers to j messages (or slices of a single message), computes their digests and outputs the hash value of their concatenation. Such modes have parallelization capabilities on a hashing process that is serial by nature. As a result, they have performance advantage on modern processor architectures. This paper provides precise specifications for these hashing modes, proposes a setup for appropriate IV’s definition, and demonstrates their performance on the latest processors. Our hope is that it would be useful for standardization of these modes

    Vortex: A New Family of One Way Hash Functions based on Rijndael Rounds and Carry-less Multiplication

    Get PDF
    We present Vortex a new family of one way hash functions that can produce message digests of 224, 256, 384 and 512 bits. The main idea behind the design of these hash functions is that we use well known algorithms that can support very fast diffusion in a small number of steps. We also balance the cryptographic strength that comes from iterating block cipher rounds with SBox substitution and diffusion (like Whirlpool) against the need to have a lightweight implementation with as small number of rounds as possible. We use a variable number of Rijndael rounds with a stronger key schedule. Our goal is not to protect a secret symmetric key but to support perfect mixing of the bits of the input into the hash value. Rijndael rounds are followed by our variant of Galois Field multiplication. This achieves cross-mixing between 128-bit or 256-bit sets. Our hash function uses the Enveloped Merkle-Damgard construction to support properties such as collision resistance, first and second pre-image resistance, pseudorandom oracle preservation and pseudorandom function preservation. We provide analytical results that demonstrate that the number of queries required for finding a collision with probability greater or equal to 0.5 in an ideal block cipher approximation of Vortex 256 is at least 1.18x2^122.55 if the attacker uses randomly selected message words. We also provide experimental results that indicate that the compression function of Vortex is not inferior to that of the SHA family regarding its capability to preserve the pseudorandom oracle property. We list a number of well known attacks and discuss how the Vortex design addresses them. The main strength of the Vortex design is that this hash function can demonstrate an expected performance of 2.2-2.5 cycles per byte in future processors with instruction set support for Rijndael rounds and carry-less multiplication. We provide arguments why we believe this is a trend in the industry. We also discuss how optimized assembly code can be written that demonstrates such performance

    SimpleENC and SimpleENCsmall -- an Authenticated Encryption Mode for the Lightweight Setting

    Get PDF
    Block cipher modes of operation provide a way to securely encrypt using a block cipher, and different modes of operation achieve different tradeoffs of security, performance and simplicity. In this paper, we present a new authenticated encryption scheme that is designed for the lightweight cryptography setting, but can be used in standard settings as well. Our mode of encryption is extremely simple, requiring only a single block cipher primitive (in forward direction) and minimal padding, and supports streaming (online encryption). In addition, our mode achieves very strong security bounds, and can even provide good security when the block size is just 64 bits. As such, it is highly suitable for lightweight settings, where the lifetime of the key and/or overall amount encrypted may be high. Our new scheme can be seen as an improved version of CCM that supports streaming, and provides much better bounds

    Distinguishing a truncated random permutation from a random function

    Get PDF
    An oracle chooses a function f from the set of n bits strings to itself, which is either a randomly chosen permutation or a randomly chosen function. When queried by an n-bit string w, the oracle computes f(w), truncates the m last bits, and returns only the first n-m bits of f(w). How many queries does a querying adversary need to submit in order to distinguish the truncated permutation from a random function? In 1998, Hall et al. showed an algorithm for determining (with high probability) whether or not f is a permutation, using O ( 2^((m+n)/2) ) queries. They also showed that if m n/7, their method gives a weaker bound. In this manuscript, we show how a modification of the method used by Hall et al. can solve the porblem completely. It extends the result to essentially every m, showing that Omega ( 2^((m+n)/2) ) queries are needed to get a non-negligible distinguishing advantage. We recently became aware that a better bound for the distinguishing advantage, for every m<n, follows from a result of Stam published, in a different context, already in 1978
    • …
    corecore