938 research outputs found
The BrainScaleS-2 Neuromorphic Platform — A Report on the Integration and Operation of an Open Science Hardware Platform within EBRAINS
This report presents the challenges encountered and the solutions created for the operation of the BrainScaleS neuromorphic platform, and the overall progress leading to this state at the end of the Human Brain Project (HBP)
Limits on Fundamental Limits to Computation
An indispensable part of our lives, computing has also become essential to
industries and governments. Steady improvements in computer hardware have been
supported by periodic doubling of transistor densities in integrated circuits
over the last fifty years. Such Moore scaling now requires increasingly heroic
efforts, stimulating research in alternative hardware and stirring controversy.
To help evaluate emerging technologies and enrich our understanding of
integrated-circuit scaling, we review fundamental limits to computation: in
manufacturing, energy, physical space, design and verification effort, and
algorithms. To outline what is achievable in principle and in practice, we
recall how some limits were circumvented, compare loose and tight limits. We
also point out that engineering difficulties encountered by emerging
technologies may indicate yet-unknown limits.Comment: 15 pages, 4 figures, 1 tabl
Intrinsically Evolvable Artificial Neural Networks
Dedicated hardware implementations of neural networks promise to provide faster, lower power operation when compared to software implementations executing on processors. Unfortunately, most custom hardware implementations do not support intrinsic training of these networks on-chip. The training is typically done using offline software simulations and the obtained network is synthesized and targeted to the hardware offline. The FPGA design presented here facilitates on-chip intrinsic training of artificial neural networks. Block-based neural networks (BbNN), the type of artificial neural networks implemented here, are grid-based networks neuron blocks. These networks are trained using genetic algorithms to simultaneously optimize the network structure and the internal synaptic parameters. The design supports online structure and parameter updates, and is an intrinsically evolvable BbNN platform supporting functional-level hardware evolution. Functional-level evolvable hardware (EHW) uses evolutionary algorithms to evolve interconnections and internal parameters of functional modules in reconfigurable computing systems such as FPGAs. Functional modules can be any hardware modules such as multipliers, adders, and trigonometric functions. In the implementation presented, the functional module is a neuron block. The designed platform is suitable for applications in dynamic environments, and can be adapted and retrained online. The online training capability has been demonstrated using a case study. A performance characterization model for RC implementations of BbNNs has also been presented
MLPF: Efficient machine-learned particle-flow reconstruction using graph neural networks
In general-purpose particle detectors, the particle-flow algorithm may be
used to reconstruct a comprehensive particle-level view of the event by
combining information from the calorimeters and the trackers, significantly
improving the detector resolution for jets and the missing transverse momentum.
In view of the planned high-luminosity upgrade of the CERN Large Hadron
Collider (LHC), it is necessary to revisit existing reconstruction algorithms
and ensure that both the physics and computational performance are sufficient
in an environment with many simultaneous proton-proton interactions (pileup).
Machine learning may offer a prospect for computationally efficient event
reconstruction that is well-suited to heterogeneous computing platforms, while
significantly improving the reconstruction quality over rule-based algorithms
for granular detectors. We introduce MLPF, a novel, end-to-end trainable,
machine-learned particle-flow algorithm based on parallelizable,
computationally efficient, and scalable graph neural networks optimized using a
multi-task objective on simulated events. We report the physics and
computational performance of the MLPF algorithm on a Monte Carlo dataset of top
quark-antiquark pairs produced in proton-proton collisions in conditions
similar to those expected for the high-luminosity LHC. The MLPF algorithm
improves the physics response with respect to a rule-based benchmark algorithm
and demonstrates computationally scalable particle-flow reconstruction in a
high-pileup environment.Comment: 15 pages, 10 figure
Investigating Single Precision Floating General Matrix Multiply in Heterogeneous Hardware
The fundamental operation of matrix multiplication is ubiquitous across a myriad of disciplines. Yet, the identification of new optimizations for matrix multiplication remains relevant for emerging hardware architectures and heterogeneous systems. Frameworks such as OpenCL enable computation orchestration on existing systems, and its availability using the Intel High Level Synthesis compiler allows users to architect new designs for reconfigurable hardware using C/C++. Using the HARPv2 as a vehicle for exploration, we investigate the utility of several of the most notable matrix multiplication optimizations to better understand the performance portability of OpenCL and the implications for such optimizations on this and future heterogeneous architectures. Our results give targeted insights into the applicability of best practices that were for existing architectures when used on emerging heterogeneous systems
A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures
In recent years, the field of Deep Learning has seen many disruptive and
impactful advancements. Given the increasing complexity of deep neural
networks, the need for efficient hardware accelerators has become more and more
pressing to design heterogeneous HPC platforms. The design of Deep Learning
accelerators requires a multidisciplinary approach, combining expertise from
several areas, spanning from computer architecture to approximate computing,
computational models, and machine learning algorithms. Several methodologies
and tools have been proposed to design accelerators for Deep Learning,
including hardware-software co-design approaches, high-level synthesis methods,
specific customized compilers, and methodologies for design space exploration,
modeling, and simulation. These methodologies aim to maximize the exploitable
parallelism and minimize data movement to achieve high performance and energy
efficiency. This survey provides a holistic review of the most influential
design methodologies and EDA tools proposed in recent years to implement Deep
Learning accelerators, offering the reader a wide perspective in this rapidly
evolving field. In particular, this work complements the previous survey
proposed by the same authors in [203], which focuses on Deep Learning hardware
accelerators for heterogeneous HPC platforms
- …