275 research outputs found
Revisiting a Masked Lookup-Table Compression Scheme
Lookup-table based side-channel countermeasure is the prime choice for masked S-box software implementations at very low orders. To mask an -bit to -bit S-box at first- and second- orders, one requires a temporary table in RAM of size bits. Recently, Vadnala (CT-RSA 2017) suggested masked table compression schemes at first- and second-orders to reduce the table size by (approximately) a factor of , where is a parameter. Though greater compression results in a greater execution time, these proposals would still be attractive for highly resource constrained devices.
In this work, we contradict the second-order security claim of the second-order table compression scheme by Vadnala. We do this by exhibiting several pairs of intermediate variables that jointly depend on the bits of the secret. Motivated by the fact that randomness is also a costly resource for highly resource constrained devices, we then propose a variant of the first-order table compression scheme of Vadnala that has the new randomness complexity of about instead of for the original proposal. We achieve this without inducing any noticeable difference in the overall execution time or memory requirement of the original scheme. Finally, we show that the randomness complexity of is optimal in an algebraic sense
Second-Order Masked Lookup Table Compression Scheme
Masking by lookup table randomisation is a well-known technique used to achieve side-channel attack resistance for software implementations, particularly, against DPA attacks. The randomised table technique for first- and second-order security requires about m * 2^n bits of RAM to store an (n, m)-bit masked S-box lookup table. Table compression helps in reducing the amount of memory required, and this is useful for highly resource-constrained IoT devices. Recently, Vadnala (CT-RSA 2017) proposed a randomised table compression scheme for first- and second-order security in the probing leakage model. This scheme reduces the RAM memory required by about a factor of 2^l, where l is a compression parameter. Vivek (Indocrypt 2017) demonstrated an attack against the second-order scheme of Vadnala. Hence achieving table compression at second and higher orders is an open problem.
In this work, we propose a second-order secure randomised table compression scheme which works for any (n, m)-bit S-box. Our proposal is a variant of Vadnala\u27s scheme that is not only secure but also significantly improves the time-memory trade-off. Specifically, we improve the online execution time by a factor of 2^(n-l). Our proposed scheme is proved 2-SNI secure in the probing leakage model. We have implemented our method for AES-128 on a 32-bit ARM Cortex processor. We are able to reduce the memory required to store a randomised S-box table for second-order AES-128 implementation to 59 bytes
High Performance Construction of RecSplit Based Minimal Perfect Hash Functions
A minimal perfect hash function (MPHF) bijectively maps a set S of objects to the first |S| integers. It can be used as a building block in databases and data compression. RecSplit [Esposito et al., ALENEX\u2720] is currently the most space efficient practical minimal perfect hash function. It heavily relies on trying out hash functions in a brute force way.
We introduce rotation fitting, a new technique that makes the search more efficient by drastically reducing the number of tried hash functions. Additionally, we greatly improve the construction time of RecSplit by harnessing parallelism on the level of bits, vectors, cores, and GPUs.
In combination, the resulting improvements yield speedups up to 239 on an 8-core CPU and up to 5438 using a GPU. The original single-threaded RecSplit implementation needs 1.5 hours to construct an MPHF for 5 Million objects with 1.56 bits per object. On the GPU, we achieve the same space usage in just 5 seconds. Given that the speedups are larger than the increase in energy consumption, our implementation is more energy efficient than the original implementation
High Performance Construction of RecSplit Based Minimal Perfect Hash Functions
A minimal perfect hash function (MPHF) bijectively maps a set S of objects to the first |S| integers. It can be used as a building block in databases and data compression. RecSplit [Esposito et al., ALENEX\u2720] is currently the most space efficient practical minimal perfect hash function. It heavily relies on trying out hash functions in a brute force way.
We introduce rotation fitting, a new technique that makes the search more efficient by drastically reducing the number of tried hash functions. Additionally, we greatly improve the construction time of RecSplit by harnessing parallelism on the level of bits, vectors, cores, and GPUs.
In combination, the resulting improvements yield speedups up to 239 on an 8-core CPU and up to 5438 using a GPU. The original single-threaded RecSplit implementation needs 1.5 hours to construct an MPHF for 5 Million objects with 1.56 bits per object. On the GPU, we achieve the same space usage in just 5 seconds. Given that the speedups are larger than the increase in energy consumption, our implementation is more energy efficient than the original implementation
MOSFHET: Optimized Software for FHE over the Torus
Homomorphic encryption is one of the most secure solutions for processing sensitive information in untrusted environments, and there have been many recent advances towards its efficient implementation for the evaluation of linear functions and approximated arithmetic. However, the practical performance when evaluating arbitrary (nonlinear) functions is still a major challenge for HE schemes. The TFHE scheme [Chillotti et al., 2016] is the current state-of-the-art for the evaluation of arbitrary functions, and, in this work, we focus on improving its performance. We divide this paper into two parts. First, we review and implement the main techniques to improve performance or error behavior in TFHE proposed so far. For many, this is the first practical implementation. Then, we introduce novel improvements to several of them and new approaches to implement some commonly used procedures. We also show which proposals can be suitably combined to achieve better results. We provide a single library containing all the reviewed techniques as well as our original contributions. Our implementation is up to 1.2 times faster than previous ones with a similar optimization level, and our novel techniques provide speedups of up to 2.83 times on algorithms such as the Full-Domain Functional Bootstrap (FDFB)
Towards Real-Time Neural Video Codec for Cross-Platform Application Using Calibration Information
The state-of-the-art neural video codecs have outperformed the most
sophisticated traditional codecs in terms of RD performance in certain cases.
However, utilizing them for practical applications is still challenging for two
major reasons. 1) Cross-platform computational errors resulting from floating
point operations can lead to inaccurate decoding of the bitstream. 2) The high
computational complexity of the encoding and decoding process poses a challenge
in achieving real-time performance. In this paper, we propose a real-time
cross-platform neural video codec, which is capable of efficiently decoding of
720P video bitstream from other encoding platforms on a consumer-grade GPU.
First, to solve the problem of inconsistency of codec caused by the uncertainty
of floating point calculations across platforms, we design a calibration
transmitting system to guarantee the consistent quantization of entropy
parameters between the encoding and decoding stages. The parameters that may
have transboundary quantization between encoding and decoding are identified in
the encoding stage, and their coordinates will be delivered by auxiliary
transmitted bitstream. By doing so, these inconsistent parameters can be
processed properly in the decoding stage. Furthermore, to reduce the bitrate of
the auxiliary bitstream, we rectify the distribution of entropy parameters
using a piecewise Gaussian constraint. Second, to match the computational
limitations on the decoding side for real-time video codec, we design a
lightweight model. A series of efficiency techniques enable our model to
achieve 25 FPS decoding speed on NVIDIA RTX 2080 GPU. Experimental results
demonstrate that our model can achieve real-time decoding of 720P videos while
encoding on another platform. Furthermore, the real-time model brings up to a
maximum of 24.2\% BD-rate improvement from the perspective of PSNR with the
anchor H.265.Comment: 14 page
- …