275 research outputs found

    Revisiting a Masked Lookup-Table Compression Scheme

    Get PDF
    Lookup-table based side-channel countermeasure is the prime choice for masked S-box software implementations at very low orders. To mask an nn-bit to mm-bit S-box at first- and second- orders, one requires a temporary table in RAM of size m2nm 2^n bits. Recently, Vadnala (CT-RSA 2017) suggested masked table compression schemes at first- and second-orders to reduce the table size by (approximately) a factor of 2l2^l, where ll is a parameter. Though greater compression results in a greater execution time, these proposals would still be attractive for highly resource constrained devices. In this work, we contradict the second-order security claim of the second-order table compression scheme by Vadnala. We do this by exhibiting several pairs of intermediate variables that jointly depend on the bits of the secret. Motivated by the fact that randomness is also a costly resource for highly resource constrained devices, we then propose a variant of the first-order table compression scheme of Vadnala that has the new randomness complexity of about ll instead of 2l2^l for the original proposal. We achieve this without inducing any noticeable difference in the overall execution time or memory requirement of the original scheme. Finally, we show that the randomness complexity of ll is optimal in an algebraic sense

    Second-Order Masked Lookup Table Compression Scheme

    Get PDF
    Masking by lookup table randomisation is a well-known technique used to achieve side-channel attack resistance for software implementations, particularly, against DPA attacks. The randomised table technique for first- and second-order security requires about m * 2^n bits of RAM to store an (n, m)-bit masked S-box lookup table. Table compression helps in reducing the amount of memory required, and this is useful for highly resource-constrained IoT devices. Recently, Vadnala (CT-RSA 2017) proposed a randomised table compression scheme for first- and second-order security in the probing leakage model. This scheme reduces the RAM memory required by about a factor of 2^l, where l is a compression parameter. Vivek (Indocrypt 2017) demonstrated an attack against the second-order scheme of Vadnala. Hence achieving table compression at second and higher orders is an open problem. In this work, we propose a second-order secure randomised table compression scheme which works for any (n, m)-bit S-box. Our proposal is a variant of Vadnala\u27s scheme that is not only secure but also significantly improves the time-memory trade-off. Specifically, we improve the online execution time by a factor of 2^(n-l). Our proposed scheme is proved 2-SNI secure in the probing leakage model. We have implemented our method for AES-128 on a 32-bit ARM Cortex processor. We are able to reduce the memory required to store a randomised S-box table for second-order AES-128 implementation to 59 bytes

    High Performance Construction of RecSplit Based Minimal Perfect Hash Functions

    Get PDF
    A minimal perfect hash function (MPHF) bijectively maps a set S of objects to the first |S| integers. It can be used as a building block in databases and data compression. RecSplit [Esposito et al., ALENEX\u2720] is currently the most space efficient practical minimal perfect hash function. It heavily relies on trying out hash functions in a brute force way. We introduce rotation fitting, a new technique that makes the search more efficient by drastically reducing the number of tried hash functions. Additionally, we greatly improve the construction time of RecSplit by harnessing parallelism on the level of bits, vectors, cores, and GPUs. In combination, the resulting improvements yield speedups up to 239 on an 8-core CPU and up to 5438 using a GPU. The original single-threaded RecSplit implementation needs 1.5 hours to construct an MPHF for 5 Million objects with 1.56 bits per object. On the GPU, we achieve the same space usage in just 5 seconds. Given that the speedups are larger than the increase in energy consumption, our implementation is more energy efficient than the original implementation

    High Performance Construction of RecSplit Based Minimal Perfect Hash Functions

    Get PDF
    A minimal perfect hash function (MPHF) bijectively maps a set S of objects to the first |S| integers. It can be used as a building block in databases and data compression. RecSplit [Esposito et al., ALENEX\u2720] is currently the most space efficient practical minimal perfect hash function. It heavily relies on trying out hash functions in a brute force way. We introduce rotation fitting, a new technique that makes the search more efficient by drastically reducing the number of tried hash functions. Additionally, we greatly improve the construction time of RecSplit by harnessing parallelism on the level of bits, vectors, cores, and GPUs. In combination, the resulting improvements yield speedups up to 239 on an 8-core CPU and up to 5438 using a GPU. The original single-threaded RecSplit implementation needs 1.5 hours to construct an MPHF for 5 Million objects with 1.56 bits per object. On the GPU, we achieve the same space usage in just 5 seconds. Given that the speedups are larger than the increase in energy consumption, our implementation is more energy efficient than the original implementation

    MOSFHET: Optimized Software for FHE over the Torus

    Get PDF
    Homomorphic encryption is one of the most secure solutions for processing sensitive information in untrusted environments, and there have been many recent advances towards its efficient implementation for the evaluation of linear functions and approximated arithmetic. However, the practical performance when evaluating arbitrary (nonlinear) functions is still a major challenge for HE schemes. The TFHE scheme [Chillotti et al., 2016] is the current state-of-the-art for the evaluation of arbitrary functions, and, in this work, we focus on improving its performance. We divide this paper into two parts. First, we review and implement the main techniques to improve performance or error behavior in TFHE proposed so far. For many, this is the first practical implementation. Then, we introduce novel improvements to several of them and new approaches to implement some commonly used procedures. We also show which proposals can be suitably combined to achieve better results. We provide a single library containing all the reviewed techniques as well as our original contributions. Our implementation is up to 1.2 times faster than previous ones with a similar optimization level, and our novel techniques provide speedups of up to 2.83 times on algorithms such as the Full-Domain Functional Bootstrap (FDFB)

    Towards Real-Time Neural Video Codec for Cross-Platform Application Using Calibration Information

    Full text link
    The state-of-the-art neural video codecs have outperformed the most sophisticated traditional codecs in terms of RD performance in certain cases. However, utilizing them for practical applications is still challenging for two major reasons. 1) Cross-platform computational errors resulting from floating point operations can lead to inaccurate decoding of the bitstream. 2) The high computational complexity of the encoding and decoding process poses a challenge in achieving real-time performance. In this paper, we propose a real-time cross-platform neural video codec, which is capable of efficiently decoding of 720P video bitstream from other encoding platforms on a consumer-grade GPU. First, to solve the problem of inconsistency of codec caused by the uncertainty of floating point calculations across platforms, we design a calibration transmitting system to guarantee the consistent quantization of entropy parameters between the encoding and decoding stages. The parameters that may have transboundary quantization between encoding and decoding are identified in the encoding stage, and their coordinates will be delivered by auxiliary transmitted bitstream. By doing so, these inconsistent parameters can be processed properly in the decoding stage. Furthermore, to reduce the bitrate of the auxiliary bitstream, we rectify the distribution of entropy parameters using a piecewise Gaussian constraint. Second, to match the computational limitations on the decoding side for real-time video codec, we design a lightweight model. A series of efficiency techniques enable our model to achieve 25 FPS decoding speed on NVIDIA RTX 2080 GPU. Experimental results demonstrate that our model can achieve real-time decoding of 720P videos while encoding on another platform. Furthermore, the real-time model brings up to a maximum of 24.2\% BD-rate improvement from the perspective of PSNR with the anchor H.265.Comment: 14 page
    • …
    corecore