3,093 research outputs found

    An algorithm for DNA read alignment on quantum accelerators

    Full text link
    With small-scale quantum processors transitioning from experimental physics labs to industrial products, these processors allow us to efficiently compute important algorithms in various fields. In this paper, we propose a quantum algorithm to address the challenging field of big data processing for genome sequence reconstruction. This research describes an architecture-aware implementation of a quantum algorithm for sub-sequence alignment. A new algorithm named QiBAM (quantum indexed bidirectional associative memory) is proposed, that uses approximate pattern-matching based on Hamming distances. QiBAM extends the Grover's search algorithm in two ways to allow for: (1) approximate matches needed for read errors in genomics, and (2) a distributed search for multiple solutions over the quantum encoding of DNA sequences. This approach gives a quadratic speedup over the classical algorithm. A full implementation of the algorithm is provided and verified using the OpenQL compiler and QX simulator framework. This represents a first exploration towards a full-stack quantum accelerated genome sequencing pipeline design. The open-source implementation can be found on https://github.com/prince-ph0en1x/QAGS.Comment: Keywords: quantum algorithms, quantum search, DNA read alignment, genomics, associative memory, accelerators, in-memory computin

    Application-Driven Near-Data Processing for Similarity Search

    Full text link
    Similarity search is a key to a variety of applications including content-based search for images and video, recommendation systems, data deduplication, natural language processing, computer vision, databases, computational biology, and computer graphics. At its core, similarity search manifests as k-nearest neighbors (kNN), a computationally simple primitive consisting of highly parallel distance calculations and a global top-k sort. However, kNN is poorly supported by today's architectures because of its high memory bandwidth requirements. This paper proposes an application-driven near-data processing accelerator for similarity search: the Similarity Search Associative Memory (SSAM). By instantiating compute units close to memory, SSAM benefits from the higher memory bandwidth and density exposed by emerging memory technologies. We evaluate the SSAM design down to layout on top of the Micron hybrid memory cube (HMC), and show that SSAM can achieve up to two orders of magnitude area-normalized throughput and energy efficiency improvement over multicore CPUs; we also show SSAM is faster and more energy efficient than competing GPUs and FPGAs. Finally, we show that SSAM is also useful for other data intensive tasks like kNN index construction, and can be generalized to semantically function as a high capacity content addressable memory.Comment: 15 pages, 8 figures, 7 table

    FUSE: Fusing STT-MRAM into GPUs to Alleviate Off-Chip Memory Access Overheads

    Full text link
    In this work, we propose FUSE, a novel GPU cache system that integrates spin-transfer torque magnetic random-access memory (STT-MRAM) into the on-chip L1D cache. FUSE can minimize the number of outgoing memory accesses over the interconnection network of GPU's multiprocessors, which in turn can considerably improve the level of massive computing parallelism in GPUs. Specifically, FUSE predicts a read-level of GPU memory accesses by extracting GPU runtime information and places write-once-read-multiple (WORM) data blocks into the STT-MRAM, while accommodating write-multiple data blocks over a small portion of SRAM in the L1D cache. To further reduce the off-chip memory accesses, FUSE also allows WORM data blocks to be allocated anywhere in the STT-MRAM by approximating the associativity with the limited number of tag comparators and I/O peripherals. Our evaluation results show that, in comparison to a traditional GPU cache, our proposed heterogeneous cache reduces the number of outgoing memory references by 32% across the interconnection network, thereby improving the overall performance by 217% and reducing energy cost by 53%

    From Ans\"atze to Z-gates: a NASA View of Quantum Computing

    Full text link
    For the last few years, the NASA Quantum Artificial Intelligence Laboratory (QuAIL) has been performing research to assess the potential impact of quantum computers on challenging computational problems relevant to future NASA missions. A key aspect of this research is devising methods to most effectively utilize emerging quantum computing hardware. Research questions include what experiments on early quantum hardware would give the most insight into the potential impact of quantum computing, the design of algorithms to explore on such hardware, and the development of tools to minimize the quantum resource requirements. We survey work relevant to these questions, with a particular emphasis on our recent work in quantum algorithms and applications, in elucidating mechanisms of quantum mechanics and their uses for quantum computational purposes, and in simulation, compilation, and physics-inspired classical algorithms. To our early application thrusts in planning and scheduling, fault diagnosis, and machine learning, we add thrusts related to robustness of communication networks and the simulation of many-body systems for material science and chemistry. We provide a brief update on quantum annealing work, but concentrate on gate-model quantum computing research advances within the last couple of years.Comment: 20 pages plus extensive references, 3 figure

    Near-Term Quantum-Classical Associative Adversarial Networks

    Full text link
    We introduce a new hybrid quantum-classical adversarial machine learning architecture called a quantum-classical associative adversarial network (QAAN). This architecture consists of a classical generative adversarial network with a small auxiliary quantum Boltzmann machine that is simultaneously trained on an intermediate layer of the discriminator of the generative network. We numerically study the performance of QAANs compared to their classical counterparts on the MNIST and CIFAR-10 data sets, and show that QAANs attain a higher quality of learning when evaluated using the Inception score and the Fr\'{e}chet Inception distance. As the QAAN architecture only relies on sampling simple local observables of a small quantum Boltzmann machine, this model is particularly amenable for implementation on the current and next generations of quantum devices.Comment: 11 pages, 9 figure

    Combined Compute and Storage: Configurable Memristor Arrays to Accelerate Search

    Full text link
    Emerging technologies present opportunities for system designers to meet the challenges presented by competing trends of big data analytics and limitations on CMOS scaling. Specifically, memristors are an emerging high-density technology where the individual memristors can be used as storage or to perform computation. The voltage applied across a memristor determines its behavior (storage vs. compute), which enables a configurable memristor substrate that can embed computation with storage. This paper explores accelerating point and range search queries as instances of the more general configurable combined compute and storage capabilities of memristor arrays. We first present MemCAM, a configurable memristor-based content addressable memory for the cases when fast, infrequent searches over large datasets are required. For frequent searches, memristor lifetime becomes a concern. To increase memristor array lifetime we introduce hybrid data structures that combine trees with MemCAM using conventional CMOS processor/cache hierarchies for the upper levels of the tree and configurable memristor technologies for lower levels. We use SPICE to analyze energy consumption and access time of memristors and use analytic models to evaluate the performance of configurable hybrid data structures. The results show that with acceptable energy consumption our configurable hybrid data structures improve performance of search intensive applications and achieve lifetime in years or decades under continuous queries. Furthermore, the configurability of memristor arrays and the proposed data structures provide opportunities to tune the trade- off between performance and lifetime and the data structures can be easily adapted to future memristors or other technologies with improved endurance

    Quantum Computer Architecture: Towards Full-Stack Quantum Accelerators

    Full text link
    This paper presents the definition and implementation of a quantum computer architecture to enable creating a new computational device - a quantum computer as an accelerator. In this paper, we present explicitly the idea of a quantum accelerator which contains the full stack of the layers of an accelerator. Such a stack starts at the highest level describing the target application of the accelerator. The next layer abstracts the quantum logic outlining the algorithm that is to be executed on the quantum accelerator. In our case, the logic is expressed in the universal quantum-classical hybrid computation language developed in the group, called OpenQL, which visualised the quantum processor as a computational accelerator. The OpenQL compiler translates the program to a common assembly language, called cQASM, which can be executed on a quantum simulator. The cQASM represents the instruction set that can be executed by the micro-architecture implemented in the quantum accelerator. In a subsequent step, the compiler can convert the cQASM to generate the eQASM, which is executable on a particular experimental device incorporating the platform-specific parameters. This way, we are able to distinguish clearly the experimental research towards better qubits, and the industrial and societal applications that need to be developed and executed on a quantum device. The first case offers experimental physicists with a full-stack experimental platform using realistic qubits with decoherence and error-rates while the second case offers perfect qubits to the quantum application developer, where there is no decoherence nor error-rates. We conclude the paper by explicitly presenting three examples of full-stack quantum accelerators, for an experimental superconducting processor, for quantum accelerated genome sequencing and for near-term generic optimisation problems based on quantum heuristic approaches

    High-Performance Out-of-core Block Randomized Singular Value Decomposition on GPU

    Full text link
    Fast computation of singular value decomposition (SVD) is of great interest in various machine learning tasks. Recently, SVD methods based on randomized linear algebra have shown significant speedup in this regime. This paper attempts to further accelerate the computation by harnessing a modern computing architecture, namely graphics processing unit (GPU), with the goal of processing large-scale data that may not fit in the GPU memory. It leads to a new block randomized algorithm that fully utilizes the power of GPUs and efficiently processes large-scale data in an out-of- core fashion. Our experiment shows that the proposed block randomized SVD (BRSVD) method outperforms existing randomized SVD methods in terms of speed with retaining the same accuracy. We also show its application to convex robust principal component analysis, which shows significant speedup in computer vision applications

    A Survey of Neuromorphic Computing and Neural Networks in Hardware

    Full text link
    Neuromorphic computing has come to refer to a variety of brain-inspired computers, devices, and models that contrast the pervasive von Neumann computer architecture. This biologically inspired approach has created highly connected synthetic neurons and synapses that can be used to model neuroscience theories as well as solve challenging machine learning problems. The promise of the technology is to create a brain-like ability to learn and adapt, but the technical challenges are significant, starting with an accurate neuroscience model of how the brain works, to finding materials and engineering breakthroughs to build devices to support these models, to creating a programming framework so the systems can learn, to creating applications with brain-like capabilities. In this work, we provide a comprehensive survey of the research and motivations for neuromorphic computing over its history. We begin with a 35-year review of the motivations and drivers of neuromorphic computing, then look at the major research areas of the field, which we define as neuro-inspired models, algorithms and learning approaches, hardware and devices, supporting systems, and finally applications. We conclude with a broad discussion on the major research topics that need to be addressed in the coming years to see the promise of neuromorphic computing fulfilled. The goals of this work are to provide an exhaustive review of the research conducted in neuromorphic computing since the inception of the term, and to motivate further work by illuminating gaps in the field where new research is needed

    Efficient and Scalable Algorithms for Smoothed Particle Hydrodynamics on Hybrid Shared/Distributed-Memory Architectures

    Full text link
    This paper describes a new fast and implicitly parallel approach to neighbour-finding in multi-resolution Smoothed Particle Hydrodynamics (SPH) simulations. This new approach is based on hierarchical cell decompositions and sorted interactions, within a task-based formulation. It is shown to be faster than traditional tree-based codes, and to scale better than domain decomposition-based approaches on hybrid shared/distributed-memory parallel architectures, e.g. clusters of multi-cores, achieving a 40Ă—40\times speedup over the Gadget-2 simulation code.Comment: Submitted to SIAM Journal on Scientific Computin
    • …
    corecore