840 research outputs found

    The Mailman Algorithm: A Note on Matrix Vector Multiplication

    Full text link

    New Graph Decompositions and Combinatorial Boolean Matrix Multiplication Algorithms

    Full text link
    We revisit the fundamental Boolean Matrix Multiplication (BMM) problem. With the invention of algebraic fast matrix multiplication over 50 years ago, it also became known that BMM can be solved in truly subcubic O(nω)O(n^\omega) time, where ω<3\omega<3; much work has gone into bringing ω\omega closer to 22. Since then, a parallel line of work has sought comparably fast combinatorial algorithms but with limited success. The naive O(n3)O(n^3)-time algorithm was initially improved by a log2n\log^2{n} factor [Arlazarov et al.; RAS'70], then by log2.25n\log^{2.25}{n} [Bansal and Williams; FOCS'09], then by log3n\log^3{n} [Chan; SODA'15], and finally by log4n\log^4{n} [Yu; ICALP'15]. We design a combinatorial algorithm for BMM running in time n3/2Ω(logn7)n^3 / 2^{\Omega(\sqrt[7]{\log n})} -- a speed-up over cubic time that is stronger than any poly-log factor. This comes tantalizingly close to refuting the conjecture from the 90s that truly subcubic combinatorial algorithms for BMM are impossible. This popular conjecture is the basis for dozens of fine-grained hardness results. Our main technical contribution is a new regularity decomposition theorem for Boolean matrices (or equivalently, bipartite graphs) under a notion of regularity that was recently introduced and analyzed analytically in the context of communication complexity [Kelley, Lovett, Meka; arXiv'23], and is related to a similar notion from the recent work on 33-term arithmetic progression free sets [Kelley, Meka; FOCS'23]

    Central Asia as an object of Orientalist narratives in the Age of Bandung

    Get PDF

    Algorithms for Solving Linear and Polynomial Systems of Equations over Finite Fields with Applications to Cryptanalysis

    Get PDF
    This dissertation contains algorithms for solving linear and polynomial systems of equations over GF(2). The objective is to provide fast and exact tools for algebraic cryptanalysis and other applications. Accordingly, it is divided into two parts. The first part deals with polynomial systems. Chapter 2 contains a successful cryptanalysis of Keeloq, the block cipher used in nearly all luxury automobiles. The attack is more than 16,000 times faster than brute force, but queries 0.62 × 2^32 plaintexts. The polynomial systems of equations arising from that cryptanalysis were solved via SAT-solvers. Therefore, Chapter 3 introduces a new method of solving polynomial systems of equations by converting them into CNF-SAT problems and using a SAT-solver. Finally, Chapter 4 contains a discussion on how SAT-solvers work internally. The second part deals with linear systems over GF(2), and other small fields (and rings). These occur in cryptanalysis when using the XL algorithm, which converts polynomial systems into larger linear systems. We introduce a new complexity model and data structures for GF(2)-matrix operations. This is discussed in Appendix B but applies to all of Part II. Chapter 5 contains an analysis of "the Method of Four Russians" for multiplication and a variant for matrix inversion, which is log n faster than Gaussian Elimination, and can be combined with Strassen-like algorithms. Chapter 6 contains an algorithm for accelerating matrix multiplication over small finite fields. It is feasible but the memory cost is so high that it is mostly of theoretical interest. Appendix A contains some discussion of GF(2)-linear algebra and how it differs from linear algebra in R and C. Appendix C discusses algorithms faster than Strassen's algorithm, and contains proofs that matrix multiplication, matrix squaring, triangular matrix inversion, LUP-factorization, general matrix in- version and the taking of determinants, are equicomplex. These proofs are already known, but are here gathered into one place in the same notation

    Brute-Force Cryptanalysis with Aging Hardware: Controlling Half the Output of SHA-256

    Get PDF
    This paper describes a "three-way collision" on SHA-256 truncated to 128 bits. More precisely, it gives three random-looking bit strings whose hashes by SHA-256 maintain a non-trivial relation: their XOR starts with 128 zero bits. They have been found by brute-force, without exploiting any cryptographic weakness in the hash function itself. This shows that birthday-like computations on 128 bits are becoming increasingly feasible, even for academic teams without substantial means. These bit strings have been obtained by solving a large instance of the three-list generalized birthday problem, a difficult case known as the 3XOR problem. The whole computation consisted of two equally challenging phases: assembling the 3XOR instance and solving it. It was made possible by the combination of: 1) recent progress on algorithms for the 3XOR problem, 2) creative use of "dedicated" hardware accelerators, 3) adapted implementations of 3XOR algorithms that could run on massively parallel machines. Building the three lists required 2 67.6 evaluations of the compression function of SHA-256. They were performed in 7 calendar months by two obsolete secondhand bitcoin mining devices, which can now be acquired on eBay for about 80e. The actual instance of the 3XOR problem was solved in 300 CPU years on a 7-year old IBM Bluegene/Q computer, a few weeks before it was scrapped. To the best of our knowledge, this is the first explicit 128-bit collision-like result for SHA-256. It is the first bitcoin-accelerated cryptanalytic computation and it is also one of the largest public ones

    Accelerating transitive closure of large-scale sparse graphs

    Get PDF
    Finding the transitive closure of a graph is a fundamental graph problem where another graph is obtained in which an edge exists between two nodes if and only if there is a path in our graph from one node to the other. The reachability matrix of a graph is its transitive closure. This thesis describes a novel approach that uses anti-sections to obtain the transitive closure of a graph. It also examines its advantages when implemented in parallel on a CPU using the Hornet graph data structure. Graph representations of real-world systems are typically sparse in nature due to lesser connectivity between nodes. The anti-section approach is designed specifically to improve performance for large scale sparse graphs. The NVIDIA Titan V CPU is used for the execution of the anti-section parallel implementations. The Dual-Round and Hash-Based implementations of the Anti-Section transitive closure approach provide a significant speedup over several parallel and sequential implementations
    corecore