12 research outputs found

    High performance lattice reduction on heterogeneous computing platform

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/s11227-014-1201-2The lattice reduction (LR) technique has become very important in many engineering fields. However, its high complexity makes difficult its use in real-time applications, especially in applications that deal with large matrices. As a solution, the modified block LLL (MB-LLL) algorithm was introduced, where several levels of parallelism were exploited: (a) fine-grained parallelism was achieved through the cost-reduced all-swap LLL (CR-AS-LLL) algorithm introduced together with the MB-LLL by Jzsa et al. (Proceedings of the tenth international symposium on wireless communication systems, 2013) and (b) coarse-grained parallelism was achieved by applying the block-reduction concept presented by Wetzel (Algorithmic number theory. Springer, New York, pp 323-337, 1998). In this paper, we present the cost-reduced MB-LLL (CR-MB-LLL) algorithm, which allows to significantly reduce the computational complexity of the MB-LLL by allowing the relaxation of the first LLL condition while executing the LR of submatrices, resulting in the delay of the Gram-Schmidt coefficients update and by using less costly procedures during the boundary checks. The effects of complexity reduction and implementation details are analyzed and discussed for several architectures. A mapping of the CR-MB-LLL on a heterogeneous platform is proposed and it is compared with implementations running on a dynamic parallelism enabled GPU and a multi-core CPU. The mapping on the architecture proposed allows a dynamic scheduling of kernels where the overhead introduced is hidden by the use of several CUDA streams. Results show that the execution time of the CR-MB-LLL algorithm on the heterogeneous platform outperforms the multi-core CPU and it is more efficient than the CR-AS-LLL algorithm in case of large matrices.Financial support for this study was provided by grants TAMOP-4.2.1./B-11/2/KMR-2011-0002, TAMOP-4.2.2/B-10/1-2010-0014 from the Pazmany Peter Catholic University, European Union ERDF, Spanish Government through TEC2012-38142-C04-01 project and Generalitat Valenciana through PROMETEO/2009/013 project.Jozsa, CM.; Domene Oltra, F.; Vidal Maciá, AM.; Piñero Sipán, MG.; González Salvador, A. (2014). High performance lattice reduction on heterogeneous computing platform. Journal of Supercomputing. 70(2):772-785. https://doi.org/10.1007/s11227-014-1201-2S772785702Józsa CM, Domene F, Piñero G, González A, Vidal AM (2013) Efficient GPU implementation of lattice-reduction-aided multiuser precoding. In: Proceedings of the tenth international symposium on wireless communication systems (ISWCS 2013)Wetzel S (1998) An efficient parallel block-reduction algorithm. In: Buhler JP (ed) Algorithmic number theory. Lecture notes in computer science, vol 1423. Springer, Berlin, Heidelberg, pp 323–337Wubben D, Seethaler D, Jaldén J, Matz G (2011) Lattice reduction. Signal Process Mag IEEE 28(3):70–91Lenstra AK, Lenstra HW, Lovász L (1982) Factoring polynomials with rational coefficients. Math Ann 261(4):515–534Bremner MR (2012) Lattice basis reduction: an introduction to the LLL algorithm and its applications. CRC Press, USAWu D, Eilert J, Liu D (2008) A programmable lattice-reduction aided detector for MIMO-OFDMA. In: 4th IEEE international conference on circuits and systems for communications (ICCSC 2008), pp 293–297Barbero LG, Milliner DL, Ratnarajah T, Barry JR, Cowan C (2009) Rapid prototyping of Clarkson’s lattice reduction for MIMO detection. In: IEEE international conference on communications (ICC’09), pp 1–5Gestner B, Zhang W, Ma X, Anderson D (2011) Lattice reduction for MIMO detection: from theoretical analysis to hardware realization. IEEE Trans Circ Syst I Regul Pap 58(4):813–826Shabany M, Youssef A, Gulak G (2013) High-throughput 0.13- \upmu μ m CMOS lattice reduction core supporting 880 Mb/s detection. IEEE Trans Very Large Scale Integr (VLSI) Syst 21(5):848–861Luo Y, Qiao S (2011) A parallel LLL algorithm. In: Proceedings of the fourth international C* conference on computer science and software engineering, pp 93–101Backes W, Wetzel S (2011) Parallel lattice basis reduction—the road to many-core. In: IEEE 13th international conference on high performance computing and communications (HPCC)Ahmad U, Amin A, Li M, Pollin S, Van der Perre L, Catthoor F (2011) Scalable block-based parallel lattice reduction algorithm for an SDR baseband processor. In: 2011 IEEE international conference on communications (ICC)Villard G (1992) Parallel lattice basis reduction. In: Papers from the international symposium on symbolic and algebraic computation (ISSAC’92). ACM, New YorkDomene F, Józsa CM, Vidal AM, Piñero G, Gonzalez A (2013) Performance analysis of a parallel lattice reduction algorithm on many-core architectures. In: Proceedings of the 13th international conference on computational and mathematical methods in science and engineeringGestner B, Zhang W, Ma X, Anderson DV (2008) VLSI implementation of a lattice reduction algorithm for low-complexity equalization. In: 4th IEEE international conference on circuits and systems for communications (ICCSC 2008), pp 643–647Burg A, Seethaler D, Matz G (2007) VLSI implementation of a lattice-reduction algorithm for multi-antenna broadcast precoding. In: IEEE international symposium on circuits and systems (ISCAS 2007), pp 673–676Bruderer L, Studer C, Wenk M, Seethaler D, Burg A (2010) VLSI implementation of a low-complexity LLL lattice reduction algorithm for MIMO detection. In: Proceedings of 2010 IEEE international symposium on circuits and systems (ISCAS

    Reduction algorithms for the cryptanalysis of lattice based asymmetrical cryptosystems

    Get PDF
    Thesis (Master)--Izmir Institute of Technology, Computer Engineering, Izmir, 2008Includes bibliographical references (leaves: 79-91)Text in English; Abstract: Turkish and Englishxi, 119 leavesThe theory of lattices has attracted a great deal of attention in cryptology in recent years. Several cryptosystems are constructed based on the hardness of the lattice problems such as the shortest vector problem and the closest vector problem. The aim of this thesis is to study the most commonly used lattice basis reduction algorithms, namely Lenstra Lenstra Lovasz (LLL) and Block Kolmogorov Zolotarev (BKZ) algorithms, which are utilized to approximately solve the mentioned lattice based problems.Furthermore, the most popular variants of these algorithms in practice are evaluated experimentally by varying the common reduction parameter delta in order to propose some practical assessments about the effect of this parameter on the process of basis reduction.These kind of practical assessments are believed to have non-negligible impact on the theory of lattice reduction, and so the cryptanalysis of lattice cryptosystems, due to thefact that the contemporary nature of the reduction process is mainly controlled by theheuristics

    BoostReduce - A Framework For Strong Lattice Basis Reduction

    Get PDF
    In this paper, we propose a new generic reduction framework BoostReduce for strong lattice basis reduction. At the core of our new framework is an iterative method which uses a newly-developed algorithm for finding short lattice vectors and integrating them efficiently into an improved lattice basis. We present BoostBKZ as an instance of BoostReduce using the Block-Korkine-Zolotarev (BKZ) reduction. BoostBKZ is tailored to make effective use of modern computer architectures in that it takes advantage of multiple threads. Experimental results of BoostBKZ show a significant reduction in running time while maintaining the quality of the reduced lattice basis in comparison to the traditional BKZ reduction algorithm

    Algebraic and Euclidean Lattices: Optimal Lattice Reduction and Beyond

    Get PDF
    We introduce a framework generalizing lattice reduction algorithms to module lattices in order to practically and efficiently solve the γ\gamma-Hermite Module-SVP problem over arbitrary cyclotomic fields. The core idea is to exploit the structure of the subfields for designing a doubly-recursive strategy of reduction: both recursive in the rank of the module and in the field we are working in. Besides, we demonstrate how to leverage the inherent symplectic geometry existing in the tower of fields to provide a significant speed-up of the reduction for rank two modules. The recursive strategy over the rank can also be applied to the reduction of Euclidean lattices, and we can perform a reduction in asymptotically almost the same time as matrix multiplication. As a byproduct of the design of these fast reductions, we also generalize to all cyclotomic fields and provide speedups for many previous number theoretical algorithms. Quantitatively, we show that a module of rank 2 over a cyclotomic field of degree nn can be heuristically reduced within approximation factor 2O~(n)2^{\tilde{O}(n)} in time O~(n2B)\tilde{O}(n^2B), where BB is the bitlength of the entries. For BB large enough, this complexity shrinks to O~(nlog23B)\tilde{O}(n^{\log_2 3}B). This last result is particularly striking as it goes below the estimate of n2Bn^2B swaps given by the classical analysis of the LLL algorithm using the so-called potential

    Implementation and Evaluation of Algorithmic Skeletons: Parallelisation of Computer Algebra Algorithms

    Get PDF
    This thesis presents design and implementation approaches for the parallel algorithms of computer algebra. We use algorithmic skeletons and also further approaches, like data parallel arithmetic and actors. We have implemented skeletons for divide and conquer algorithms and some special parallel loops, that we call ‘repeated computation with a possibility of premature termination’. We introduce in this thesis a rational data parallel arithmetic. We focus on parallel symbolic computation algorithms, for these algorithms our arithmetic provides a generic parallelisation approach. The implementation is carried out in Eden, a parallel functional programming language based on Haskell. This choice enables us to encode both the skeletons and the programs in the same language. Moreover, it allows us to refrain from using two different languages—one for the implementation and one for the interface—for our implementation of computer algebra algorithms. Further, this thesis presents methods for evaluation and estimation of parallel execution times. We partition the parallel execution time into two components. One of them accounts for the quality of the parallelisation, we call it the ‘parallel penalty’. The other is the sequential execution time. For the estimation, we predict both components separately, using statistical methods. This enables very confident estimations, although using drastically less measurement points than other methods. We have applied both our evaluation and estimation approaches to the parallel programs presented in this thesis. We haven also used existing estimation methods. We developed divide and conquer skeletons for the implementation of fast parallel multiplication. We have implemented the Karatsuba algorithm, Strassen’s matrix multiplication algorithm and the fast Fourier transform. The latter was used to implement polynomial convolution that leads to a further fast multiplication algorithm. Specially for our implementation of Strassen algorithm we have designed and implemented a divide and conquer skeleton basing on actors. We have implemented the parallel fast Fourier transform, and not only did we use new divide and conquer skeletons, but also developed a map-and-transpose skeleton. It enables good parallelisation of the Fourier transform. The parallelisation of Karatsuba multiplication shows a very good performance. We have analysed the parallel penalty of our programs and compared it to the serial fraction—an approach, known from literature. We also performed execution time estimations of our divide and conquer programs. This thesis presents a parallel map+reduce skeleton scheme. It allows us to combine the usual parallel map skeletons, like parMap, farm, workpool, with a premature termination property. We use this to implement the so-called ‘parallel repeated computation’, a special form of a speculative parallel loop. We have implemented two probabilistic primality tests: the Rabin–Miller test and the Jacobi sum test. We parallelised both with our approach. We analysed the task distribution and stated the fitting configurations of the Jacobi sum test. We have shown formally that the Jacobi sum test can be implemented in parallel. Subsequently, we parallelised it, analysed the load balancing issues, and produced an optimisation. The latter enabled a good implementation, as verified using the parallel penalty. We have also estimated the performance of the tests for further input sizes and numbers of processing elements. Parallelisation of the Jacobi sum test and our generic parallelisation scheme for the repeated computation is our original contribution. The data parallel arithmetic was defined not only for integers, which is already known, but also for rationals. We handled the common factors of the numerator or denominator of the fraction with the modulus in a novel manner. This is required to obtain a true multiple-residue arithmetic, a novel result of our research. Using these mathematical advances, we have parallelised the determinant computation using the Gauß elimination. As always, we have performed task distribution analysis and estimation of the parallel execution time of our implementation. A similar computation in Maple emphasised the potential of our approach. Data parallel arithmetic enables parallelisation of entire classes of computer algebra algorithms. Summarising, this thesis presents and thoroughly evaluates new and existing design decisions for high-level parallelisations of computer algebra algorithms

    ETRUSign: NTRUSign sobre los enteros de Eisenstein

    Get PDF

    Parallel Lattice Basis Reduction -- The Road to Many-Core

    No full text
    This paper introduces a new parallel algorithm that allows for an efficient LLL reduction using today’s emerging many-core systems. This work develops suitable methods that efficiently implement the idea of splitting a lattice basis into smaller subproblems, LLL reducing the subproblems, and recombining the subproblems afterwards to obtain an overall LLL reduced basis. The new many-core algorithm outperforms any current parallel LLL algorithm. Experiments on a 48-core test system show a speed-up of approximately 10 for SVP challenge type lattice bases and a remarkable speed-up of approximately 50 for knapsack type lattice bases
    corecore