840 research outputs found
New Graph Decompositions and Combinatorial Boolean Matrix Multiplication Algorithms
We revisit the fundamental Boolean Matrix Multiplication (BMM) problem. With
the invention of algebraic fast matrix multiplication over 50 years ago, it
also became known that BMM can be solved in truly subcubic time,
where ; much work has gone into bringing closer to .
Since then, a parallel line of work has sought comparably fast combinatorial
algorithms but with limited success. The naive -time algorithm was
initially improved by a factor [Arlazarov et al.; RAS'70], then by
[Bansal and Williams; FOCS'09], then by [Chan;
SODA'15], and finally by [Yu; ICALP'15].
We design a combinatorial algorithm for BMM running in time -- a speed-up over cubic time that is stronger
than any poly-log factor. This comes tantalizingly close to refuting the
conjecture from the 90s that truly subcubic combinatorial algorithms for BMM
are impossible. This popular conjecture is the basis for dozens of fine-grained
hardness results.
Our main technical contribution is a new regularity decomposition theorem for
Boolean matrices (or equivalently, bipartite graphs) under a notion of
regularity that was recently introduced and analyzed analytically in the
context of communication complexity [Kelley, Lovett, Meka; arXiv'23], and is
related to a similar notion from the recent work on -term arithmetic
progression free sets [Kelley, Meka; FOCS'23]
Algorithms for Solving Linear and Polynomial Systems of Equations over Finite Fields with Applications to Cryptanalysis
This dissertation contains algorithms for solving linear and polynomial systems
of equations over GF(2). The objective is to provide fast and exact tools for algebraic
cryptanalysis and other applications. Accordingly, it is divided into two parts.
The first part deals with polynomial systems. Chapter 2 contains a successful
cryptanalysis of Keeloq, the block cipher used in nearly all luxury automobiles.
The attack is more than 16,000 times faster than brute force, but queries 0.62 × 2^32
plaintexts. The polynomial systems of equations arising from that cryptanalysis
were solved via SAT-solvers. Therefore, Chapter 3 introduces a new method of
solving polynomial systems of equations by converting them into CNF-SAT problems
and using a SAT-solver. Finally, Chapter 4 contains a discussion on how SAT-solvers
work internally.
The second part deals with linear systems over GF(2), and other small fields
(and rings). These occur in cryptanalysis when using the XL algorithm, which converts polynomial systems into larger linear systems. We introduce a new complexity
model and data structures for GF(2)-matrix operations. This is discussed in Appendix B but applies to all of Part II. Chapter 5 contains an analysis of "the Method
of Four Russians" for multiplication and a variant for matrix inversion, which is
log n faster than Gaussian Elimination, and can be combined with Strassen-like algorithms. Chapter 6 contains an algorithm for accelerating matrix multiplication
over small finite fields. It is feasible but the memory cost is so high that it is mostly
of theoretical interest. Appendix A contains some discussion of GF(2)-linear algebra
and how it differs from linear algebra in R and C. Appendix C discusses algorithms
faster than Strassen's algorithm, and contains proofs that matrix multiplication,
matrix squaring, triangular matrix inversion, LUP-factorization, general matrix in-
version and the taking of determinants, are equicomplex. These proofs are already
known, but are here gathered into one place in the same notation
Brute-Force Cryptanalysis with Aging Hardware: Controlling Half the Output of SHA-256
This paper describes a "three-way collision" on SHA-256 truncated to 128 bits. More precisely, it gives three random-looking bit strings whose hashes by SHA-256 maintain a non-trivial relation: their XOR starts with 128 zero bits. They have been found by brute-force, without exploiting any cryptographic weakness in the hash function itself. This shows that birthday-like computations on 128 bits are becoming increasingly feasible, even for academic teams without substantial means. These bit strings have been obtained by solving a large instance of the three-list generalized birthday problem, a difficult case known as the 3XOR problem. The whole computation consisted of two equally challenging phases: assembling the 3XOR instance and solving it. It was made possible by the combination of: 1) recent progress on algorithms for the 3XOR problem, 2) creative use of "dedicated" hardware accelerators, 3) adapted implementations of 3XOR algorithms that could run on massively parallel machines. Building the three lists required 2 67.6 evaluations of the compression function of SHA-256. They were performed in 7 calendar months by two obsolete secondhand bitcoin mining devices, which can now be acquired on eBay for about 80e. The actual instance of the 3XOR problem was solved in 300 CPU years on a 7-year old IBM Bluegene/Q computer, a few weeks before it was scrapped. To the best of our knowledge, this is the first explicit 128-bit collision-like result for SHA-256. It is the first bitcoin-accelerated cryptanalytic computation and it is also one of the largest public ones
Accelerating transitive closure of large-scale sparse graphs
Finding the transitive closure of a graph is a fundamental graph problem where another graph is obtained in which an edge exists between two nodes if and only if there is a path in our graph from one node to the other. The reachability matrix of a graph is its transitive closure. This thesis describes a novel approach that uses anti-sections to obtain the transitive closure of a graph. It also examines its advantages when implemented in parallel on a CPU using the Hornet graph data structure.
Graph representations of real-world systems are typically sparse in nature due to lesser connectivity between nodes. The anti-section approach is designed specifically to improve performance for large scale sparse graphs. The NVIDIA Titan V CPU is used for the execution of the anti-section parallel implementations. The Dual-Round and Hash-Based implementations of the Anti-Section transitive closure approach provide a significant speedup over several parallel and sequential implementations
- …