17 research outputs found

    Improved Factorization of N=prqsN=p^rq^s

    Get PDF
    Bones et al. showed at Crypto 99 that moduli of the form N=prqN=p^rq can be factored in polynomial time when rlogpr \geq \log p. Their algorithm is based on Coppersmith\u27s technique for finding small roots of polynomial equations. Recently, Coron et al. showed that N=prqsN=p^rq^s can also be factored in polynomial time, but under the stronger condition rlog3pr \geq \log^3 p. In this paper, we show that N=prqsN=p^rq^s can actually be factored in polynomial time when rlogpr \geq \log p, the same condition as for N=prqN=p^rq

    Randomness Generation for Secure Hardware Masking - Unrolled Trivium to the Rescue

    Get PDF
    Masking is a prominent strategy to protect cryptographic implementations against side-channel analysis. Its popularity arises from the exponential security gains that can be achieved for (approximately) quadratic resource utilization. Many variants of the countermeasure tailored for different optimization goals have been proposed over the past decades. The common denominator among all of them is the implicit demand for robust and high entropy randomness. Simply assuming that uniformly distributed random bits are available, without taking the cost of their generation into account, leads to a poor understanding of the efficiency and performance of secure implementations. This is especially relevant in case of hardware masking schemes which are known to consume large amounts of random bits per cycle due to parallelism. Currently, there seems to be no consensus on how to most efficiently derive many pseudo-random bits per clock cycle from an initial seed and with properties suitable for masked hardware implementations. In this work, we evaluate a number of building blocks for this purpose and find that hardware-oriented stream ciphers like Trivium and its reduced-security variant Bivium B outperform all competitors when implemented in an unrolled fashion. Unrolled implementations of these primitives enable the flexible generation of many bits per cycle while maintaining high performance, which is crucial for satisfying the large randomness demands of state-of-the-art masking schemes. According to our analysis, only Linear Feedback Shift Registers (LFSRs), when also unrolled, are capable of producing long non-repetitive sequences of random-looking bits at a high rate per cycle even more efficiently than Trivium and Bivium B. Yet, these instances do not provide black-box security as they generate only linear outputs. We experimentally demonstrate that using multiple output bits from an LFSR in the same masked implementation can violate probing security and even lead to harmful randomness cancellations. Circumventing these problems, and enabling an independent analysis of randomness generation and masking scheme, requires the use of cryptographically stronger primitives like stream ciphers. As a result of our studies, we provide an evidence-based estimate for the cost of securely generating n fresh random bits per cycle. Depending on the desired level of black-box security and operating frequency, this cost can be as low as 20n to 30n ASIC gate equivalents (GE) or 3n to 4n FPGA look-up tables (LUTs), where n is the number of random bits required. Our results demonstrate that the cost per bit is (sometimes significantly) lower than estimated in previous works, incentivizing parallelism whenever exploitable and potentially moving low randomness usage in hardware masking research from a primary to secondary design goal

    Factoring N=prqsN=p^r q^s for Large rr and ss

    Get PDF
    International audienceBoneh et al. showed at Crypto 99 that moduli of the form N = p^r q can be factored in polynomial time when r ≃ log(p). Their algorithm is based on Coppersmith’s technique for finding small roots of polynomial equations. In this paper we show that N = p^r q^s can also be factored in polynomial time when r or s is at least (log p)^3; therefore we identify a new class of integers that can be efficiently factored.We also generalize our algorithm to moduli with k prime factors N = \prod_{i=1}^k p_i^{r_i} ; we show that a non-trivial factor of N can be extracted in polynomial-time if one of the exponents r_i is large enough

    Multiple-Differential Mechanism for Collision-Optimized Divide-and-Conquer Attacks

    Get PDF
    Several combined attacks have shown promising results in recovering cryptographic keys by introducing collision information into divide-and-conquer attacks to transform a part of the best key candidates within given thresholds into a much smaller collision space. However, these Collision-Optimized Divide-and-Conquer Attacks (CODCAs) uniformly demarcate the thresholds for all sub-keys, which is unreasonable. Moreover, the inadequate exploitation of collision information and backward fault tolerance mechanisms of CODCAs also lead to low attack efficiency. Finally, existing CODCAs mainly focus on improving collision detection algorithms but lack theoretical basis. We exploit Correlation-Enhanced Collision Attack (CECA) to optimize Template Attack (TA). To overcome the above-mentioned problems, we first introduce guessing theory into TA to enable the quick estimation of success probability and the corresponding complexity of key recovery. Next, a novel Multiple-Differential mechanism for CODCAs (MD-CODCA) is proposed. The first two differential mechanisms construct collision chains satisfying the given number of collisions from several sub-keys with the fewest candidates under a fixed probability provided by guessing theory, then exploit them to vote for the remaining sub-keys. This guarantees that the number of remaining chains is minimal, and makes MD-CODCA suitable for very high thresholds. Our third differential mechanism simply divides the key into several large non-overlapping ``blocks\u27\u27 to further exploit intra-block collisions from the remaining candidates and properly ignore the inter-block collisions, thus facilitating the latter key enumeration. The experimental results show that MD-CODCA significantly reduces the candidate space and lowers the complexity of collision detection, without considerably reducing the success probability of attacks

    Efficient Statistical Asynchronous Verifiable Secret Sharing and Multiparty Computation with Optimal Resilience

    Get PDF
    Verifiable Secret Sharing (VSS) is a fundamental primitive used as a building block in many distributed cryptographic tasks, such as Secure Multiparty Computation (MPC) and Byzantine Agreement (BA). An important variant of VSS is Asynchronous VSS (AVSS) which is designed to work over asynchronous networks. AVSS is a two phase (Sharing, Reconstruction) protocol carried out among n parties in the presence of a computationally unbounded active adversary, who can corrupt up to t parties. We assume that every two parties in the network are directly connected by a pairwise secure channel. In this paper, we present a new statistical AVSS protocol with optimal resilience; i.e. with n = 3t+1. Our protocol privately communicates O((\ell n^3 + n^4 \log{\frac{1}{\epsilon}}) \log{\frac{1}{\epsilon}}) bits and A-casts O(n^3 \log(n)) bits to simultaneously share \ell \geq 1 elements from a finite field F, where \epsilon is the error parameter of our protocol. There are only two known statistical AVSS protocols with n = 3t+1 reported in [CR93] and [PCR09]. The AVSS protocol of [CR93] requires a private communication of O(n^9 (\log{\frac{1}{\epsilon}})^4) bits and A-cast of O(n^9 (\log{\frac{1}{\epsilon}})^2 \log(n)) bits to share a single element from F. Thus our AVSS protocol shows a significant improvement in communication complexity over the AVSS of [CR93]. The AVSS protocol of [PCR09] requires a private communication and A-cast of O((\ell n^3 + n^4) \log{\frac{1}{\epsilon}}) bits to share \ell \geq 1 elements. However, the shared element(s) may be NULL \not \in {\mathbb F}. Thus our AVSS is better than the AVSS of [PCR09] due to the following reasons: 1. The A-cast communication of our AVSS is independent of the number of secrets i.e. \ell; 2. Our AVSS makes sure that the shared value(s) always belong to F. Using our AVSS, we design a new primitive called Asynchronous Complete Secret Sharing (ACSS) which acts as an important building block of asynchronous multiparty computation (AMPC). Using our ACSS scheme, we design a statistical AMPC protocol with optimal resilience; i.e., with n = 3t+1, that privately communicates O(n^5 \log{\frac{1}{\epsilon}}) bits per multiplication gate. This significantly improves the communication complexity of only known optimally resilient statistical AMPC of [BKR93] that privately communicates \Omega(n^{11} (\log{\frac{1}{\epsilon}})^4) bits and A-cast \Omega(n^{11} (\log{\frac{1}{\epsilon}})^2 \log(n)) bits per multiplication gate. Both our ACSS and AVSS employ several new techniques, which are of independent interest

    Efficient Asynchronous Verifiable Secret Sharing and Multiparty Computation

    Get PDF
    Secure Multi-Party Computation (MPC) providing information theoretic security allows a set of n parties to securely compute an agreed function F over a finite field F{\mathbb F}, even if t parties are under the control of a computationally unbounded active adversary. Asynchronous MPC (AMPC) is an important variant of MPC, which works over an asynchronous network. It is well known that perfect AMPC is possible if and only if n \geq 4t+1, while statistical AMPC is possible if and only if n \geq 3t+1. In this paper, we study the communication complexity of AMPC protocols (both statistical and perfect) designed with exactly n = 4t+1 parties. Our major contributions in this paper are as follows: 1. Asynchronous Verifiable Secret Sharing (AVSS) is one of the main building blocks for AMPC. In this paper, we design two AVSS protocols with 4t+1 parties: the first one is statistically secure and has non-optimal resilience, while the second one is perfectly secure and has optimal resilience. Both these schemes achieve a common interesting property, which was not achieved by the previous schemes. Specifically, our AVSS schemes allow to share a secret through a polynomial of degree at most d, where t \leq d \leq 2t. In contrast, the existing AVSS schemes can share a secret only through a polynomial of degree at most t. The new property of our AVSS simplifies the degree reduction step for the evaluation of multiplication gates in an AMPC protocol. 2.Using our statistical AVSS, we design a statistical AMPC protocol with n = 4t+1 which communicates O(n^2) field elements per multiplication gate. Though this protocol has non-optimal resilience, it significantly improves the communication complexity of the existing statistical AMPC protocols. 3. We then present a perfect AMPC protocol with n = 4t+1 (using our perfect AVSS scheme), which also communicates O(n^2) field elements per multiplication gate. This protocol improves on our statistical AMPC protocol as it has optimal resilience. To the best of our knowledge, this is the most communication efficient perfect AMPC protocol in the information theoretic setting

    Part I:

    Get PDF

    Parallel Multiplier Designs for the Galois/Counter Mode of Operation

    Get PDF
    The Galois/Counter Mode of Operation (GCM), recently standardized by NIST, simultaneously authenticates and encrypts data at speeds not previously possible for both software and hardware implementations. In GCM, data integrity is achieved by chaining Galois field multiplication operations while a symmetric key block cipher such as the Advanced Encryption Standard (AES), is used to meet goals of confidentiality. Area optimization in a number of proposed high throughput GCM designs have been approached through implementing efficient composite Sboxes for AES. Not as much work has been done in reducing area requirements of the Galois multiplication operation in the GCM which consists of up to 30% of the overall area using a bruteforce approach. Current pipelined implementations of GCM also have large key change latencies which potentially reduce the average throughput expected under traditional internet traffic conditions. This thesis aims to address these issues by presenting area efficient parallel multiplier designs for the GCM and provide an approach for achieving low latency key changes. The widely known Karatsuba parallel multiplier (KA) and the recently proposed Fan-Hasan multiplier (FH) were designed for the GCM and implemented on ASIC and FPGA architectures. This is the first time these multipliers have been compared with a practical implementation, and the FH multiplier showed note worthy improvements over the KA multiplier in terms of delay with similar area requirements. Using the composite Sbox, ASIC designs of GCM implemented with subquadratic multipliers are shown to have an area savings of up to 18%, without affecting the throughput, against designs using the brute force Mastrovito multiplier. For low delay LUT Sbox designs in GCM, although the subquadratic multipliers are a part of the critical path, implementations with the FH multiplier showed the highest efficiency in terms of area resources and throughput over all other designs. FPGA results similarly showed a significant reduction in the number of slices using subquadratic multipliers, and the highest throughput to date for FPGA implementations of GCM was also achieved. The proposed reduced latency key change design, which supports all key types of AES, showed a 20% improvement in average throughput over other GCM designs that do not use the same techniques. The GCM implementations provided in this thesis provide some of the most area efficient, yet high throughput designs to date

    Journal of Telecommunications and Information Technology, 2006, nr 3

    Get PDF
    kwartalni
    corecore