Search CORE

623 research outputs found

Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators

Author: Bell Steven Emberton
Cao Kaidi
Gao Mingyu
Ha Heonjae
Horowitz Mark
Kozyrakis Christos
Liu Qiaoyi
Nayak Ankita
Pu Jing
Raina Priyanka
Setter Jeff Ou
Yang Xuan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/04/2020
Field of study

We show that DNN accelerator micro-architectures and their program mappings represent specific choices of loop order and hardware parallelism for computing the seven nested loops of DNNs, which enables us to create a formal taxonomy of all existing dense DNN accelerators. Surprisingly, the loop transformations needed to create these hardware variants can be precisely and concisely represented by Halide's scheduling language. By modifying the Halide compiler to generate hardware, we create a system that can fairly compare these prior accelerators. As long as proper loop blocking schemes are used, and the hardware can support mapping replicated loops, many different hardware dataflows yield similar energy efficiency with good performance. This is because the loop blocking can ensure that most data references stay on-chip with good locality and the processing units have high resource utilization. How resources are allocated, especially in the memory system, has a large impact on energy and performance. By optimizing hardware resource allocation while keeping throughput constant, we achieve up to 4.2X energy improvement for Convolutional Neural Networks (CNNs), 1.6X and 1.8X improvement for Long Short-Term Memories (LSTMs) and multi-layer perceptrons (MLPs), respectively.Comment: Published as a conference paper at ASPLOS 202

arXiv.org e-Print Archive

Crossref

A compact multi-chip-module implementation of a multi-precision neural network classifier

Author: Bermak Amine
Martinez Dominique
Publication venue: Edith Cowan University, Research Online, Perth, Western Australia
Publication date: 01/01/2001
Field of study

This paper describes a novel MCM digital implementation of a reconfigurable multi-precision neural network classifier. The design is based on a scalable systolic architecture with a user defined topology and arithmetic precision of the neural network. Indeed, the MCM integrates 64/32/16 neurons with a corresponding accuracy of 4/8/16-bits. A prototype has been designed and successfully tested in CMOS 0.7 μm technolog

INRIA a CCSD electronic archive server

HAL Descartes

Research Online @ ECU

Hal-Diderot

Embedded Parallel Systolic Architecture For Multi-Filtering Techniques Using FPGA.

Author: Arshad M. R.
H. Salih Muataz
Publication venue
Publication date: 01/01/2010
Field of study

Computing systems typically suffer from delay in data processing

Crossref

Repository@USM

Empowering parallel computing with field programmable gate arrays

Author: D'Hollander Erik
Publication venue: 'IOS Press'
Publication date: 01/01/2020
Field of study

After more than 30 years, reconﬁgurable computing has grown from a concept to a mature ﬁeld of science and technology. The cornerstone of this evolution is the ﬁeld programmable gate array, a building block enabling the conﬁguration of a custom hardware architecture. The departure from static von Neumannlike architectures opens the way to eliminate the instruction overhead and to optimize the execution speed and power consumption. FPGAs now live in a growing ecosystem of development tools, enabling software programmers to map algorithms directly onto hardware. Applications abound in many directions, including data centers, IoT, AI, image processing and space exploration. The increasing success of FPGAs is largely due to an improved toolchain with solid high-level synthesis support as well as a better integration with processor and memory systems. On the other hand, long compile times and complex design exploration remain areas for improvement. In this paper we address the evolution of FPGAs towards advanced multi-functional accelerators, discuss different programming models and their HLS language implementations, as well as high-performance tuning of FPGAs integrated into a heterogeneous platform. We pinpoint fallacies and pitfalls, and identify opportunities for language enhancements and architectural reﬁnements

Ghent University Academic Bibliography

Electrically reconfigurable logic array

Author: Agarwal R. K.
Publication venue
Publication date: 01/08/1982
Field of study

To compose the complicated systems using algorithmically specialized logic circuits or processors, one solution is to perform relational computations such as union, division and intersection directly on hardware. These relations can be pipelined efficiently on a network of processors having an array configuration. These processors can be designed and implemented with a few simple cells. In order to determine the state-of-the-art in Electrically Reconfigurable Logic Array (ERLA), a survey of the available programmable logic array (PLA) and the logic circuit elements used in such arrays was conducted. Based on this survey some recommendations are made for ERLA devices

NASA Technical Reports Server

An FPGA-based Web server for high performance biological sequence alignment

Author: Benkrid Abdsamad
Benkrid K.
Kasap S.
Liu Ying
Publication venue
Publication date: 01/01/2009
Field of study

Portsmouth University Research Portal (Pure)