3,821 research outputs found
The future of computing beyond Moore's Law.
Moore's Law is a techno-economic model that has enabled the information technology industry to double the performance and functionality of digital electronics roughly every 2 years within a fixed cost, power and area. Advances in silicon lithography have enabled this exponential miniaturization of electronics, but, as transistors reach atomic scale and fabrication costs continue to rise, the classical technological driver that has underpinned Moore's Law for 50 years is failing and is anticipated to flatten by 2025. This article provides an updated view of what a post-exascale system will look like and the challenges ahead, based on our most recent understanding of technology roadmaps. It also discusses the tapering of historical improvements, and how it affects options available to continue scaling of successors to the first exascale machine. Lastly, this article covers the many different opportunities and strategies available to continue computing performance improvements in the absence of historical technology drivers. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
Recommended from our members
A RISC-V Vector Processor With Simultaneous-Switching Switched-Capacitor DC-DC Converters in 28 nm FDSOI
This work demonstrates a RISC-V vector microprocessor implemented in 28 nm FDSOI with fully integrated simultaneous-switching switched-capacitor DC-DC (SC DC-DC) converters and adaptive clocking that generates four on-chip voltages between 0.45 and 1 V using only 1.0 V core and 1.8 V IO voltage inputs. The converters achieve high efficiency at the system level by switching simultaneously to avoid charge-sharing losses and by using an adaptive clock to maximize performance for the resulting voltage ripple. Details about the implementation of the DC-DC switches, DC-DC controller, and adaptive clock are provided, and the sources of conversion loss are analyzed based on measured results. This system pushes the capabilities of dynamic voltage scaling by enabling fast transitions (20 ns), simple packaging (no off-chip passives), low area overhead (16%), high conversion efficiency (80%-86%), and high energy efficiency (26.2 DP GFLOPS/W) for mobile devices
Validating Quantum-Classical Programming Models with Tensor Network Simulations
The exploration of hybrid quantum-classical algorithms and programming models
on noisy near-term quantum hardware has begun. As hybrid programs scale towards
classical intractability, validation and benchmarking are critical to
understanding the utility of the hybrid computational model. In this paper, we
demonstrate a newly developed quantum circuit simulator based on tensor network
theory that enables intermediate-scale verification and validation of hybrid
quantum-classical computing frameworks and programming models. We present our
tensor-network quantum virtual machine (TNQVM) simulator which stores a
multi-qubit wavefunction in a compressed (factorized) form as a matrix product
state, thus enabling single-node simulations of larger qubit registers, as
compared to brute-force state-vector simulators. Our simulator is designed to
be extensible in both the tensor network form and the classical hardware used
to run the simulation (multicore, GPU, distributed). The extensibility of the
TNQVM simulator with respect to the simulation hardware type is achieved via a
pluggable interface for different numerical backends (e.g., ITensor and
ExaTENSOR numerical libraries). We demonstrate the utility of our TNQVM quantum
circuit simulator through the verification of randomized quantum circuits and
the variational quantum eigensolver algorithm, both expressed within the
eXtreme-scale ACCelerator (XACC) programming model
Towards Lattice Quantum Chromodynamics on FPGA devices
In this paper we describe a single-node, double precision Field Programmable
Gate Array (FPGA) implementation of the Conjugate Gradient algorithm in the
context of Lattice Quantum Chromodynamics. As a benchmark of our proposal we
invert numerically the Dirac-Wilson operator on a 4-dimensional grid on three
Xilinx hardware solutions: Zynq Ultrascale+ evaluation board, the Alveo U250
accelerator and the largest device available on the market, the VU13P device.
In our implementation we separate software/hardware parts in such a way that
the entire multiplication by the Dirac operator is performed in hardware, and
the rest of the algorithm runs on the host. We find out that the FPGA
implementation can offer a performance comparable with that obtained using
current CPU or Intel's many core Xeon Phi accelerators. A possible multiple
node FPGA-based system is discussed and we argue that power-efficient High
Performance Computing (HPC) systems can be implemented using FPGA devices only.Comment: 17 pages, 4 figure
ASC: A stream compiler for computing with FPGAs
Published versio
X-SRAM: Enabling In-Memory Boolean Computations in CMOS Static Random Access Memories
Silicon-based Static Random Access Memories (SRAM) and digital Boolean logic
have been the workhorse of the state-of-art computing platforms. Despite
tremendous strides in scaling the ubiquitous metal-oxide-semiconductor
transistor, the underlying \textit{von-Neumann} computing architecture has
remained unchanged. The limited throughput and energy-efficiency of the
state-of-art computing systems, to a large extent, results from the well-known
\textit{von-Neumann bottleneck}. The energy and throughput inefficiency of the
von-Neumann machines have been accentuated in recent times due to the present
emphasis on data-intensive applications like artificial intelligence, machine
learning \textit{etc}. A possible approach towards mitigating the overhead
associated with the von-Neumann bottleneck is to enable \textit{in-memory}
Boolean computations. In this manuscript, we present an augmented version of
the conventional SRAM bit-cells, called \textit{the X-SRAM}, with the ability
to perform in-memory, vector Boolean computations, in addition to the usual
memory storage operations. We propose at least six different schemes for
enabling in-memory vector computations including NAND, NOR, IMP (implication),
XOR logic gates with respect to different bit-cell topologies the 8T cell
and the 8T Differential cell. In addition, we also present a novel
\textit{`read-compute-store'} scheme, wherein the computed Boolean function can
be directly stored in the memory without the need of latching the data and
carrying out a subsequent write operation. The feasibility of the proposed
schemes has been verified using predictive transistor models and Monte-Carlo
variation analysis.Comment: This article has been accepted in a future issue of IEEE Transactions
on Circuits and Systems-I: Regular Paper
CompF2: Theoretical Calculations and Simulation Topical Group Report
This report summarizes the work of the Computational Frontier topical group
on theoretical calculations and simulation for Snowmass 2021. We discuss the
challenges, potential solutions, and needs facing six diverse but related
topical areas that span the subject of theoretical calculations and simulation
in high energy physics (HEP): cosmic calculations, particle accelerator
modeling, detector simulation, event generators, perturbative calculations, and
lattice QCD (quantum chromodynamics). The challenges arise from the next
generations of HEP experiments, which will include more complex instruments,
provide larger data volumes, and perform more precise measurements.
Calculations and simulations will need to keep up with these increased
requirements. The other aspect of the challenge is the evolution of computing
landscape away from general-purpose computing on CPUs and toward
special-purpose accelerators and coprocessors such as GPUs and FPGAs. These
newer devices can provide substantial improvements for certain categories of
algorithms, at the expense of more specialized programming and memory and data
access patterns.Comment: Report of the Computational Frontier Topical Group on Theoretical
Calculations and Simulation for Snowmass 202
- …