27 research outputs found

    Limits on Fundamental Limits to Computation

    Full text link
    An indispensable part of our lives, computing has also become essential to industries and governments. Steady improvements in computer hardware have been supported by periodic doubling of transistor densities in integrated circuits over the last fifty years. Such Moore scaling now requires increasingly heroic efforts, stimulating research in alternative hardware and stirring controversy. To help evaluate emerging technologies and enrich our understanding of integrated-circuit scaling, we review fundamental limits to computation: in manufacturing, energy, physical space, design and verification effort, and algorithms. To outline what is achievable in principle and in practice, we recall how some limits were circumvented, compare loose and tight limits. We also point out that engineering difficulties encountered by emerging technologies may indicate yet-unknown limits.Comment: 15 pages, 4 figures, 1 tabl

    Experimental Demonstrations of Native Implementation of Boolean Logic Hamiltonian in a Superconducting Quantum Annealer

    Full text link
    Experimental demonstrations of quantum annealing with native implementation of Boolean logic Hamiltonians are reported. As a superconducting integrated circuit, a problem Hamiltonian whose set of ground states is consistent with a given truth table is implemented for quantum annealing with no redundant qubits. As examples of the truth table, NAND and NOR are successfully fabricated as an identical circuit. Similarly, a native implementation of a multiplier comprising six superconducting flux qubits is also demonstrated. These native implementations of Hamiltonians consistent with Boolean logic provide an efficient and scalable way of applying annealing computation to so-called circuit satisfiability problems that aim to find a set of inputs consistent with a given output over any Boolean logic functions, especially those like factorization through a multiplier Hamiltonian. A proof-of-concept demonstration of a hybrid computing architecture for domain-specific quantum computing is described.Comment: 12 pages, 11 figure

    Accelerating board games through Hardware/Software Codesign

    Get PDF
    Board games applications usually offer a great user experience when running on desktop computers. Powerful high-performance processors working without energy restrictions successfully deal with the exploration of large game trees, delivering strong play to satisfy demanding users. However, nowadays, more and more game players are running these games on smartphones and tablets, where the lower computational power and limited power budget yield a much weaker play. Recent systems-on-a-chip include programmable logic tightly coupled with general-purpose processors enabling the inclusion of custom accelerators for any application to improve both performance and energy efficiency. In this paper, we analyze the benefits of partitioning the artificial intelligence of board games into software and hardware. We have chosen as case studies three popular and complex board games, Reversi, Blokus, and Connect6. The designs analyzed include hardware accelerators for board processing, which improve performance and energy efficiency by an order of magnitude leading to much stronger and battery-aware applications. The results demonstrate that the use of hardware/software codesign to develop board games allows sustaining or even improving the user experience across platforms while keeping power and energy low

    Performance and energy efficiency analysis of a Reversi player for FPGAs and General Purpose Processors

    Get PDF
    Board-game applications are frequently found in mobile devices where the computing performance and the energy budget are constrained. Since the Artificial Intelligence techniques applied in these games are computationally intensive, the applications developed for mobile systems are frequently simplistic, far from the level of equivalent applications developed for desktop computers. Currently board games are software applications executed on General Purpose Processors. However, they exhibit a medium degree of parallelism and a custom hardware accelerator implemented on an FPGA can take advantage of that. We have selected the well-known Reversi game as a case study because it is a very popular board game with simple rules but huge computational demands. We developed and optimized software and hardware designs for this game that apply the same classical Artificial Intelligence techniques. The applications have been executed on different representative platforms and the results demonstrate that the FPGAs implementations provide better performance, lower power consumption and, therefore, impressive energy savings. These results demonstrate that FPGAs can efficiently deal with this kind of problems

    Memory partitioning and scheduling co-optimization in behavioral synthesis

    Full text link
    Abstract—Achieving optimal throughput by extracting parallel-ism in behavioral synthesis often exaggerates memory bottleneck issues. Data partitioning is an important technique for increasing memory bandwidth by scheduling multiple simultaneous memory accesses to different memory banks. In this paper we present a vertical memory partitioning and scheduling algorithm that can generate a valid partition scheme for arbitrary affine memory inputs. It does this by arranging non-conflicting memory accesses across the border of loop iterations. A mixed memory partitioning and scheduling algorithm is also proposed to com-bine the advantages of the vertical and other state-of-art algo-rithms. A set of theorems is provided as criteria for selecting a valid partitioning scheme. This is followed by an optimal and scalable memory scheduling algorithm. By utilizing the property of constant strides between memory addresses in successive loop iterations, an address translation optimization technique for an arbitrary partition factor is proposed to improve performance, area and energy efficiency. Experimental results show that on a set of real-world medical image processing kernels, the proposed mixed algorithm with address translation optimization can gain speed-up, area reduction and power savings of 15.8%, 36 % and 32.4 % respectively, compared to the state-of-art memory parti-tioning algorithm

    Composable accelerator-rich microprocessor enhanced for adaptivity and longevity

    Get PDF
    Abstract Accelerator-rich platforms demonstrate orders of magnitude improvement in performance and energy efficiency over software, yet they lack adaptivity to new algorithms and can see low accelerator utilization. To address these issues we propose CAMEL: Composable Accelerator-rich Microprocessor Enhanced for Longevity. CAMEL features programmable fabric (PF) to extend the use of ASIC composable accelerators in supporting algorithms that are beyond the scope of the baseline platform. Using a combination of hardware extensions and compiler support, we demonstrate on average 11.6X performance improvement and 13.9X energy savings across benchmarks that deviate from the original domain for our baseline platform
    corecore