27 research outputs found
Limits on Fundamental Limits to Computation
An indispensable part of our lives, computing has also become essential to
industries and governments. Steady improvements in computer hardware have been
supported by periodic doubling of transistor densities in integrated circuits
over the last fifty years. Such Moore scaling now requires increasingly heroic
efforts, stimulating research in alternative hardware and stirring controversy.
To help evaluate emerging technologies and enrich our understanding of
integrated-circuit scaling, we review fundamental limits to computation: in
manufacturing, energy, physical space, design and verification effort, and
algorithms. To outline what is achievable in principle and in practice, we
recall how some limits were circumvented, compare loose and tight limits. We
also point out that engineering difficulties encountered by emerging
technologies may indicate yet-unknown limits.Comment: 15 pages, 4 figures, 1 tabl
Experimental Demonstrations of Native Implementation of Boolean Logic Hamiltonian in a Superconducting Quantum Annealer
Experimental demonstrations of quantum annealing with native implementation
of Boolean logic Hamiltonians are reported. As a superconducting integrated
circuit, a problem Hamiltonian whose set of ground states is consistent with a
given truth table is implemented for quantum annealing with no redundant
qubits. As examples of the truth table, NAND and NOR are successfully
fabricated as an identical circuit. Similarly, a native implementation of a
multiplier comprising six superconducting flux qubits is also demonstrated.
These native implementations of Hamiltonians consistent with Boolean logic
provide an efficient and scalable way of applying annealing computation to
so-called circuit satisfiability problems that aim to find a set of inputs
consistent with a given output over any Boolean logic functions, especially
those like factorization through a multiplier Hamiltonian. A proof-of-concept
demonstration of a hybrid computing architecture for domain-specific quantum
computing is described.Comment: 12 pages, 11 figure
Accelerating board games through Hardware/Software Codesign
Board games applications usually offer a great user experience when running on desktop computers. Powerful high-performance processors working without energy restrictions successfully deal with the exploration of large game trees, delivering strong play to satisfy demanding users. However, nowadays, more and more game players are running these games on smartphones and tablets, where the lower computational power and limited power budget yield a much weaker play. Recent systems-on-a-chip include programmable logic tightly coupled with general-purpose processors enabling the inclusion of custom accelerators for any application to improve both performance and energy efficiency. In this paper, we analyze the benefits of partitioning the artificial intelligence of board games into software and hardware. We have chosen as case studies three popular and complex board games, Reversi, Blokus, and Connect6. The designs analyzed include hardware accelerators for board processing, which improve performance and energy efficiency by an order of magnitude leading to much stronger and battery-aware applications. The results demonstrate that the use of hardware/software codesign to develop board games allows sustaining or even improving the user experience across platforms while keeping power and energy low
Performance and energy efficiency analysis of a Reversi player for FPGAs and General Purpose Processors
Board-game applications are frequently found in mobile devices where the computing performance and the energy budget are constrained. Since the Artificial Intelligence techniques applied in these games are computationally intensive, the applications developed for mobile systems are frequently simplistic, far from the level of equivalent applications developed for desktop computers.
Currently board games are software applications executed on General Purpose Processors. However, they exhibit a medium degree of parallelism and a custom hardware accelerator implemented on an FPGA can take advantage of that.
We have selected the well-known Reversi game as a case study because it is a very popular board game with simple rules but huge computational demands. We developed and optimized software and hardware designs for this game that apply the same classical Artificial Intelligence techniques. The applications have been executed on different representative platforms and the results demonstrate that the FPGAs implementations provide better performance, lower power consumption and, therefore, impressive energy savings. These results demonstrate that FPGAs can efficiently deal with this kind of problems
Memory partitioning and scheduling co-optimization in behavioral synthesis
Abstract—Achieving optimal throughput by extracting parallel-ism in behavioral synthesis often exaggerates memory bottleneck issues. Data partitioning is an important technique for increasing memory bandwidth by scheduling multiple simultaneous memory accesses to different memory banks. In this paper we present a vertical memory partitioning and scheduling algorithm that can generate a valid partition scheme for arbitrary affine memory inputs. It does this by arranging non-conflicting memory accesses across the border of loop iterations. A mixed memory partitioning and scheduling algorithm is also proposed to com-bine the advantages of the vertical and other state-of-art algo-rithms. A set of theorems is provided as criteria for selecting a valid partitioning scheme. This is followed by an optimal and scalable memory scheduling algorithm. By utilizing the property of constant strides between memory addresses in successive loop iterations, an address translation optimization technique for an arbitrary partition factor is proposed to improve performance, area and energy efficiency. Experimental results show that on a set of real-world medical image processing kernels, the proposed mixed algorithm with address translation optimization can gain speed-up, area reduction and power savings of 15.8%, 36 % and 32.4 % respectively, compared to the state-of-art memory parti-tioning algorithm
Composable accelerator-rich microprocessor enhanced for adaptivity and longevity
Abstract Accelerator-rich platforms demonstrate orders of magnitude improvement in performance and energy efficiency over software, yet they lack adaptivity to new algorithms and can see low accelerator utilization. To address these issues we propose CAMEL: Composable Accelerator-rich Microprocessor Enhanced for Longevity. CAMEL features programmable fabric (PF) to extend the use of ASIC composable accelerators in supporting algorithms that are beyond the scope of the baseline platform. Using a combination of hardware extensions and compiler support, we demonstrate on average 11.6X performance improvement and 13.9X energy savings across benchmarks that deviate from the original domain for our baseline platform