230 research outputs found

    Time-Sliced Quantum Circuit Partitioning for Modular Architectures

    Full text link
    Current quantum computer designs will not scale. To scale beyond small prototypes, quantum architectures will likely adopt a modular approach with clusters of tightly connected quantum bits and sparser connections between clusters. We exploit this clustering and the statically-known control flow of quantum programs to create tractable partitioning heuristics which map quantum circuits to modular physical machines one time slice at a time. Specifically, we create optimized mappings for each time slice, accounting for the cost to move data from the previous time slice and using a tunable lookahead scheme to reduce the cost to move to future time slices. We compare our approach to a traditional statically-mapped, owner-computes model. Our results show strict improvement over the static mapping baseline. We reduce the non-local communication overhead by 89.8\% in the best case and by 60.9\% on average. Our techniques, unlike many exact solver methods, are computationally tractable.Comment: Appears in CF'20: ACM International Conference on Computing Frontier

    GNN for time-sliced quantum circuit partitioning

    Get PDF

    Hungarian qubit assignment for optimized mapping of quantum circuits on multi-core architectures

    Get PDF
    Modular quantum computing architectures offer a promising alternative to monolithic designs for overcoming the scaling limitations of current quantum computers. To achieve scalability beyond small prototypes, quantum architectures are expected to adopt a modular approach, featuring clusters of tightly connected quantum bits with sparser connections between these clusters. Efficiently distributing qubits across multiple processing cores is critical for improving quantum computing systems’ performance and scalability. To address this challenge, we propose the Hungarian Qubit Assignment (HQA) algorithm, which leverages the Hungarian algorithm to improve qubit-to-core assignment. The HQA algorithm considers the interactions between qubits over the entire circuit, enabling fine-grained partitioning and enhanced qubit utilization. We compare the HQA algorithm with state-of-the-art alternatives through comprehensive experiments using both real-world quantum algorithms and random quantum circuits. The results demonstrate the superiority of our proposed approach, outperforming existing methods, with an average improvement of 1.28×.This work was supported in part by the European Research Council (ERC) under Grant 101042080 (WINC) and in part by the European Innovation Council (EIC) Pathfinder scheme under Grant 101099697 (QUADRATURE).Peer ReviewedPostprint (author's final draft

    Characterizing the spatio-temporal qubit traffic of a quantum intranet aiming at modular quantum computer architectures

    Get PDF
    Quantum many-core processors are envisioned as the ultimate solution for the scalability of quantum computers. Based upon Noisy Intermediate-Scale Quantum (NISQ) chips interconnected in a sort of quantum intranet, they enable large algorithms to be executed on current and close future technology. In order to optimize such architectures, it is crucial to develop tools that allow specific design space explorations. To this aim, in this paper we present a technique to perform a spatio-temporal characterization of quantum circuits running in multi-chip quantum computers. Specifically, we focus on the analysis of the qubit traffic resulting from operations that involve qubits residing in different cores, and hence quantum communication across chips, while also giving importance to the amount of intra-core operations that occur in between those communications. Using specific multi-core performance metrics and a complete set of benchmarks, our analysis showcases the opportunities that the proposed approach may provide to guide the design of multi-core quantum computers and their interconnects.Peer ReviewedPostprint (author's final draft

    Towards Scalable Circuit Partitioning for Multi-Core Quantum Architectures with Deep Reinforcement Learning

    Get PDF
    La computación cuántica tiene un inmenso potencial para resolver problemas clásicamente intratables aprovechando las propiedades únicas de los cúbits. Sin embargo, la escalabilidad de las arquitecturas cuánticas sigue siendo un desafío significativo. Para abordar este problema, se proponen arquitecturas cuánticas de múltiples núcleos. No obstante, la realización de dichas arquitecturas plantea múltiples desafíos en hardware, algoritmos y la interfaz entre ellos. En particular, uno de estos desafíos es cómo particionar de manera óptima los algoritmos para que se ajusten dentro de los múltiples núcleos. Esta tesis presenta un enfoque novedoso para la partición escalable de circuitos en arquitecturas cuánticas de múltiples núcleos utilizando Aprendizaje Profundo Reforzado. El objetivo es superar a los algoritmos metaheurísticos existentes, como el algoritmo de particionamiento de FGP-rOEE, en términos de precisión y escalabilidad. Esta investigación contribuye al avance tanto de la computación cuántica como de las técnicas de particionamiento de gráficos, ofreciendo nuevos conocimientos sobre la optimización de los sistemas cuánticos. Al abordar los desafíos asociados con la escalabilidad de las computadoras cuánticas, abrimos el camino para su implementación práctica en la resolución de problemas computacionalmente desafiantes.Quantum computing holds immense potential for solving classically intractable problems by leveraging the unique properties of qubits. However, the scalability of quantum architectures remains a significant challenge. To address this issue, multi-core quantum architectures are proposed. Yet, the realization of such multi-core architectures poses multiple challenges in hardware, algorithms, and the interface between them. In particular, one of these challenges is how to optimally partition the algorithms to fit within the cores of a multi-core quantum computer. This thesis presents a novel approach for scalable circuit partitioning on multi-core quantum architectures using Deep Reinforcement Learning. The objective is to surpass existing meta-heuristic algorithms, such as FGP-rOEE's partitioning algorithm, in terms of accuracy and scalability. This research contributes to the advancement of both quantum computing and graph partitioning techniques, offering new insights into the optimization of quantum systems. By addressing the challenges associated with scaling quantum computers, we pave the way for their practical implementation in solving computationally challenging problems

    Research works on electronic system-level design, FPGA testing, and security building blocks

    Get PDF
    This document presents an overview of the research activity carried out by the author until the date of writing. It is also meant to report on the main results generated by a few funded project involving the author as a team member. The activity covered a range of topics involving automated generation of on-chip multiprocessor systems from high-level code, with particular emphasis on the system interconnect and the memory subsystems, design automation and test techniques for hardware-reconfigurable technologies, the design of advanced hardware blocks for cryptographic and cryptanalytical applications, the implementation and evaluation of security services in distributed environments, with special focus on time-stamping and public-key certification services, as well as the interplay between security services and hardware reconfigurability. The document presents the main highlights from the published works spawned by each of the above research threads

    Compilation for Quantum Computing on Chiplets

    Full text link
    Chiplet architecture is an emerging architecture for quantum computing that could significantly increase qubit resources with its great scalability and modularity. However, as the computing scale increases, communication between qubits would become a more severe bottleneck due to the long routing distances. In this paper, we trade ancillary qubits for program concurrency by proposing a multi-entry communication highway mechanism, and building a compilation framework to efficiently manage and utilize the highway resources. Our evaluation shows that this framework significantly outperforms the baseline approach in both the circuit depth and the number of operations on some typical quantum benchmarks, leading to a more efficient and less error-prone compilation of quantum programs

    Research Activities on FPGA Design, Cryptographic Hardware, and Security Services

    Get PDF
    This paper reports on the main research results achieved by the author, including activities carried out in the context of funded Research Projects, until year 2012. The report presents an overview of the findings involving cryptographic hardware, as well as the results related to the acceleration of cryptanalytical algorithms. Another major research line involved FPGA design automation and testing. The above results were complemented by works on security service provisioning in distributed environments. The report presents an exhaustive description of all the scientific works derived from the above activities, indicating the essential insights behind each of them and the main results collected from the experimental evaluation

    Overview of research results on hardware-accelerated cryptography and security

    Get PDF
    This paper provides an overview of the research findings related to cryptographic hardware, acceleration of cryptanalytical algorithms, FPGA design automation and testing, as well as security service provisioning achieved by the author until the time of writing. The paper also refers to a few results developed in the framework of funded research projects which involved the author as a team member. The text briefly describes the implications of the main research results, indicating the corresponding publication and the essential insights behind each work

    Copernicus: Characterizing the Performance Implications of Compression Formats Used in Sparse Workloads

    Full text link
    Sparse matrices are the key ingredients of several application domains, from scientific computation to machine learning. The primary challenge with sparse matrices has been efficiently storing and transferring data, for which many sparse formats have been proposed to significantly eliminate zero entries. Such formats, essentially designed to optimize memory footprint, may not be as successful in performing faster processing. In other words, although they allow faster data transfer and improve memory bandwidth utilization -- the classic challenge of sparse problems -- their decompression mechanism can potentially create a computation bottleneck. Not only is this challenge not resolved, but also it becomes more serious with the advent of domain-specific architectures (DSAs), as they intend to more aggressively improve performance. The performance implications of using various formats along with DSAs, however, has not been extensively studied by prior work. To fill this gap of knowledge, we characterize the impact of using seven frequently used sparse formats on performance, based on a DSA for sparse matrix-vector multiplication (SpMV), implemented on an FPGA using high-level synthesis (HLS) tools, a growing and popular method for developing DSAs. Seeking a fair comparison, we tailor and optimize the HLS implementation of decompression for each format. We thoroughly explore diverse metrics, including decompression overhead, latency, balance ratio, throughput, memory bandwidth utilization, resource utilization, and power consumption, on a variety of real-world and synthetic sparse workloads.Comment: 11 pages, 14 figures, 2 table
    corecore