3,093 research outputs found
An algorithm for DNA read alignment on quantum accelerators
With small-scale quantum processors transitioning from experimental physics
labs to industrial products, these processors allow us to efficiently compute
important algorithms in various fields. In this paper, we propose a quantum
algorithm to address the challenging field of big data processing for genome
sequence reconstruction. This research describes an architecture-aware
implementation of a quantum algorithm for sub-sequence alignment. A new
algorithm named QiBAM (quantum indexed bidirectional associative memory) is
proposed, that uses approximate pattern-matching based on Hamming distances.
QiBAM extends the Grover's search algorithm in two ways to allow for: (1)
approximate matches needed for read errors in genomics, and (2) a distributed
search for multiple solutions over the quantum encoding of DNA sequences. This
approach gives a quadratic speedup over the classical algorithm. A full
implementation of the algorithm is provided and verified using the OpenQL
compiler and QX simulator framework. This represents a first exploration
towards a full-stack quantum accelerated genome sequencing pipeline design. The
open-source implementation can be found on
https://github.com/prince-ph0en1x/QAGS.Comment: Keywords: quantum algorithms, quantum search, DNA read alignment,
genomics, associative memory, accelerators, in-memory computin
Application-Driven Near-Data Processing for Similarity Search
Similarity search is a key to a variety of applications including
content-based search for images and video, recommendation systems, data
deduplication, natural language processing, computer vision, databases,
computational biology, and computer graphics. At its core, similarity search
manifests as k-nearest neighbors (kNN), a computationally simple primitive
consisting of highly parallel distance calculations and a global top-k sort.
However, kNN is poorly supported by today's architectures because of its high
memory bandwidth requirements.
This paper proposes an application-driven near-data processing accelerator
for similarity search: the Similarity Search Associative Memory (SSAM). By
instantiating compute units close to memory, SSAM benefits from the higher
memory bandwidth and density exposed by emerging memory technologies. We
evaluate the SSAM design down to layout on top of the Micron hybrid memory cube
(HMC), and show that SSAM can achieve up to two orders of magnitude
area-normalized throughput and energy efficiency improvement over multicore
CPUs; we also show SSAM is faster and more energy efficient than competing GPUs
and FPGAs. Finally, we show that SSAM is also useful for other data intensive
tasks like kNN index construction, and can be generalized to semantically
function as a high capacity content addressable memory.Comment: 15 pages, 8 figures, 7 table
FUSE: Fusing STT-MRAM into GPUs to Alleviate Off-Chip Memory Access Overheads
In this work, we propose FUSE, a novel GPU cache system that integrates
spin-transfer torque magnetic random-access memory (STT-MRAM) into the on-chip
L1D cache. FUSE can minimize the number of outgoing memory accesses over the
interconnection network of GPU's multiprocessors, which in turn can
considerably improve the level of massive computing parallelism in GPUs.
Specifically, FUSE predicts a read-level of GPU memory accesses by extracting
GPU runtime information and places write-once-read-multiple (WORM) data blocks
into the STT-MRAM, while accommodating write-multiple data blocks over a small
portion of SRAM in the L1D cache. To further reduce the off-chip memory
accesses, FUSE also allows WORM data blocks to be allocated anywhere in the
STT-MRAM by approximating the associativity with the limited number of tag
comparators and I/O peripherals. Our evaluation results show that, in
comparison to a traditional GPU cache, our proposed heterogeneous cache reduces
the number of outgoing memory references by 32% across the interconnection
network, thereby improving the overall performance by 217% and reducing energy
cost by 53%
From Ans\"atze to Z-gates: a NASA View of Quantum Computing
For the last few years, the NASA Quantum Artificial Intelligence Laboratory
(QuAIL) has been performing research to assess the potential impact of quantum
computers on challenging computational problems relevant to future NASA
missions. A key aspect of this research is devising methods to most effectively
utilize emerging quantum computing hardware. Research questions include what
experiments on early quantum hardware would give the most insight into the
potential impact of quantum computing, the design of algorithms to explore on
such hardware, and the development of tools to minimize the quantum resource
requirements. We survey work relevant to these questions, with a particular
emphasis on our recent work in quantum algorithms and applications, in
elucidating mechanisms of quantum mechanics and their uses for quantum
computational purposes, and in simulation, compilation, and physics-inspired
classical algorithms. To our early application thrusts in planning and
scheduling, fault diagnosis, and machine learning, we add thrusts related to
robustness of communication networks and the simulation of many-body systems
for material science and chemistry. We provide a brief update on quantum
annealing work, but concentrate on gate-model quantum computing research
advances within the last couple of years.Comment: 20 pages plus extensive references, 3 figure
Near-Term Quantum-Classical Associative Adversarial Networks
We introduce a new hybrid quantum-classical adversarial machine learning
architecture called a quantum-classical associative adversarial network (QAAN).
This architecture consists of a classical generative adversarial network with a
small auxiliary quantum Boltzmann machine that is simultaneously trained on an
intermediate layer of the discriminator of the generative network. We
numerically study the performance of QAANs compared to their classical
counterparts on the MNIST and CIFAR-10 data sets, and show that QAANs attain a
higher quality of learning when evaluated using the Inception score and the
Fr\'{e}chet Inception distance. As the QAAN architecture only relies on
sampling simple local observables of a small quantum Boltzmann machine, this
model is particularly amenable for implementation on the current and next
generations of quantum devices.Comment: 11 pages, 9 figure
Combined Compute and Storage: Configurable Memristor Arrays to Accelerate Search
Emerging technologies present opportunities for system designers to meet the
challenges presented by competing trends of big data analytics and limitations
on CMOS scaling. Specifically, memristors are an emerging high-density
technology where the individual memristors can be used as storage or to perform
computation. The voltage applied across a memristor determines its behavior
(storage vs. compute), which enables a configurable memristor substrate that
can embed computation with storage.
This paper explores accelerating point and range search queries as instances
of the more general configurable combined compute and storage capabilities of
memristor arrays. We first present MemCAM, a configurable memristor-based
content addressable memory for the cases when fast, infrequent searches over
large datasets are required. For frequent searches, memristor lifetime becomes
a concern. To increase memristor array lifetime we introduce hybrid data
structures that combine trees with MemCAM using conventional CMOS
processor/cache hierarchies for the upper levels of the tree and configurable
memristor technologies for lower levels.
We use SPICE to analyze energy consumption and access time of memristors and
use analytic models to evaluate the performance of configurable hybrid data
structures. The results show that with acceptable energy consumption our
configurable hybrid data structures improve performance of search intensive
applications and achieve lifetime in years or decades under continuous queries.
Furthermore, the configurability of memristor arrays and the proposed data
structures provide opportunities to tune the trade- off between performance and
lifetime and the data structures can be easily adapted to future memristors or
other technologies with improved endurance
Quantum Computer Architecture: Towards Full-Stack Quantum Accelerators
This paper presents the definition and implementation of a quantum computer
architecture to enable creating a new computational device - a quantum computer
as an accelerator. In this paper, we present explicitly the idea of a quantum
accelerator which contains the full stack of the layers of an accelerator. Such
a stack starts at the highest level describing the target application of the
accelerator. The next layer abstracts the quantum logic outlining the algorithm
that is to be executed on the quantum accelerator. In our case, the logic is
expressed in the universal quantum-classical hybrid computation language
developed in the group, called OpenQL, which visualised the quantum processor
as a computational accelerator. The OpenQL compiler translates the program to a
common assembly language, called cQASM, which can be executed on a quantum
simulator. The cQASM represents the instruction set that can be executed by the
micro-architecture implemented in the quantum accelerator. In a subsequent
step, the compiler can convert the cQASM to generate the eQASM, which is
executable on a particular experimental device incorporating the
platform-specific parameters. This way, we are able to distinguish clearly the
experimental research towards better qubits, and the industrial and societal
applications that need to be developed and executed on a quantum device. The
first case offers experimental physicists with a full-stack experimental
platform using realistic qubits with decoherence and error-rates while the
second case offers perfect qubits to the quantum application developer, where
there is no decoherence nor error-rates. We conclude the paper by explicitly
presenting three examples of full-stack quantum accelerators, for an
experimental superconducting processor, for quantum accelerated genome
sequencing and for near-term generic optimisation problems based on quantum
heuristic approaches
High-Performance Out-of-core Block Randomized Singular Value Decomposition on GPU
Fast computation of singular value decomposition (SVD) is of great interest
in various machine learning tasks. Recently, SVD methods based on randomized
linear algebra have shown significant speedup in this regime. This paper
attempts to further accelerate the computation by harnessing a modern computing
architecture, namely graphics processing unit (GPU), with the goal of
processing large-scale data that may not fit in the GPU memory. It leads to a
new block randomized algorithm that fully utilizes the power of GPUs and
efficiently processes large-scale data in an out-of- core fashion. Our
experiment shows that the proposed block randomized SVD (BRSVD) method
outperforms existing randomized SVD methods in terms of speed with retaining
the same accuracy. We also show its application to convex robust principal
component analysis, which shows significant speedup in computer vision
applications
A Survey of Neuromorphic Computing and Neural Networks in Hardware
Neuromorphic computing has come to refer to a variety of brain-inspired
computers, devices, and models that contrast the pervasive von Neumann computer
architecture. This biologically inspired approach has created highly connected
synthetic neurons and synapses that can be used to model neuroscience theories
as well as solve challenging machine learning problems. The promise of the
technology is to create a brain-like ability to learn and adapt, but the
technical challenges are significant, starting with an accurate neuroscience
model of how the brain works, to finding materials and engineering
breakthroughs to build devices to support these models, to creating a
programming framework so the systems can learn, to creating applications with
brain-like capabilities. In this work, we provide a comprehensive survey of the
research and motivations for neuromorphic computing over its history. We begin
with a 35-year review of the motivations and drivers of neuromorphic computing,
then look at the major research areas of the field, which we define as
neuro-inspired models, algorithms and learning approaches, hardware and
devices, supporting systems, and finally applications. We conclude with a broad
discussion on the major research topics that need to be addressed in the coming
years to see the promise of neuromorphic computing fulfilled. The goals of this
work are to provide an exhaustive review of the research conducted in
neuromorphic computing since the inception of the term, and to motivate further
work by illuminating gaps in the field where new research is needed
Efficient and Scalable Algorithms for Smoothed Particle Hydrodynamics on Hybrid Shared/Distributed-Memory Architectures
This paper describes a new fast and implicitly parallel approach to
neighbour-finding in multi-resolution Smoothed Particle Hydrodynamics (SPH)
simulations. This new approach is based on hierarchical cell decompositions and
sorted interactions, within a task-based formulation. It is shown to be faster
than traditional tree-based codes, and to scale better than domain
decomposition-based approaches on hybrid shared/distributed-memory parallel
architectures, e.g. clusters of multi-cores, achieving a speedup
over the Gadget-2 simulation code.Comment: Submitted to SIAM Journal on Scientific Computin
- …