5,172 research outputs found
On fast multiplication of a matrix by its transpose
We present a non-commutative algorithm for the multiplication of a
2x2-block-matrix by its transpose using 5 block products (3 recursive calls and
2 general products) over C or any finite field.We use geometric considerations
on the space of bilinear forms describing 2x2 matrix products to obtain this
algorithm and we show how to reduce the number of involved additions.The
resulting algorithm for arbitrary dimensions is a reduction of multiplication
of a matrix by its transpose to general matrix product, improving by a constant
factor previously known reductions.Finally we propose schedules with low memory
footprint that support a fast and memory efficient practical implementation
over a finite field.To conclude, we show how to use our result in LDLT
factorization.Comment: ISSAC 2020, Jul 2020, Kalamata, Greec
Quantum Speedup by Quantum Annealing
We study the glued-trees problem of Childs et. al. in the adiabatic model of
quantum computing and provide an annealing schedule to solve an oracular
problem exponentially faster than classically possible. The Hamiltonians
involved in the quantum annealing do not suffer from the so-called sign
problem. Unlike the typical scenario, our schedule is efficient even though the
minimum energy gap of the Hamiltonians is exponentially small in the problem
size. We discuss generalizations based on initial-state randomization to avoid
some slowdowns in adiabatic quantum computing due to small gaps.Comment: 7 page
Scalable Emulation of Sign-ProblemFree Hamiltonians with Room Temperature p-bits
The growing field of quantum computing is based on the concept of a q-bit
which is a delicate superposition of 0 and 1, requiring cryogenic temperatures
for its physical realization along with challenging coherent coupling
techniques for entangling them. By contrast, a probabilistic bit or a p-bit is
a robust classical entity that fluctuates between 0 and 1, and can be
implemented at room temperature using present-day technology. Here, we show
that a probabilistic coprocessor built out of room temperature p-bits can be
used to accelerate simulations of a special class of quantum many-body systems
that are sign-problemfree or stoquastic, leveraging the well-known
Suzuki-Trotter decomposition that maps a -dimensional quantum many body
Hamiltonian to a +1-dimensional classical Hamiltonian. This mapping allows
an efficient emulation of a quantum system by classical computers and is
commonly used in software to perform Quantum Monte Carlo (QMC) algorithms. By
contrast, we show that a compact, embedded MTJ-based coprocessor can serve as a
highly efficient hardware-accelerator for such QMC algorithms providing several
orders of magnitude improvement in speed compared to optimized CPU
implementations. Using realistic device-level SPICE simulations we demonstrate
that the correct quantum correlations can be obtained using a classical
p-circuit built with existing technology and operating at room temperature. The
proposed coprocessor can serve as a tool to study stoquastic quantum many-body
systems, overcoming challenges associated with physical quantum annealers.Comment: Fixed minor typos and expanded Appendi
On fast multiplication of a matrix by its transpose
We present a non-commutative algorithm for the multiplication of a block-matrix by its transpose over C or any finite field using 5 recursive products. We use geometric considerations on the space of bilinear forms describing 2×2 matrix products to obtain this algorithm and we show how to reduce the number of involved additions. The resulting algorithm for arbitrary dimensions is a reduction of multiplication of a matrix by its transpose to general matrix product, improving by a constant factor previously known reductions. Finally we propose space and time efficient schedules that enable us to provide fast practical implementations for higher-dimensional matrix products
0.5 Petabyte Simulation of a 45-Qubit Quantum Circuit
Near-term quantum computers will soon reach sizes that are challenging to
directly simulate, even when employing the most powerful supercomputers. Yet,
the ability to simulate these early devices using classical computers is
crucial for calibration, validation, and benchmarking. In order to make use of
the full potential of systems featuring multi- and many-core processors, we use
automatic code generation and optimization of compute kernels, which also
enables performance portability. We apply a scheduling algorithm to quantum
supremacy circuits in order to reduce the required communication and simulate a
45-qubit circuit on the Cori II supercomputer using 8,192 nodes and 0.5
petabytes of memory. To our knowledge, this constitutes the largest quantum
circuit simulation to this date. Our highly-tuned kernels in combination with
the reduced communication requirements allow an improvement in time-to-solution
over state-of-the-art simulations by more than an order of magnitude at every
scale
Quantum Algorithms for Finding Constant-sized Sub-hypergraphs
We develop a general framework to construct quantum algorithms that detect if
a -uniform hypergraph given as input contains a sub-hypergraph isomorphic to
a prespecified constant-sized hypergraph. This framework is based on the
concept of nested quantum walks recently proposed by Jeffery, Kothari and
Magniez [SODA'13], and extends the methodology designed by Lee, Magniez and
Santha [SODA'13] for similar problems over graphs. As applications, we obtain a
quantum algorithm for finding a -clique in a -uniform hypergraph on
vertices with query complexity , and a quantum algorithm for
determining if a ternary operator over a set of size is associative with
query complexity .Comment: 18 pages; v2: changed title, added more backgrounds to the
introduction, added another applicatio
- …