4,587 research outputs found
EIE: Efficient Inference Engine on Compressed Deep Neural Network
State-of-the-art deep neural networks (DNNs) have hundreds of millions of
connections and are both computationally and memory intensive, making them
difficult to deploy on embedded systems with limited hardware resources and
power budgets. While custom hardware helps the computation, fetching weights
from DRAM is two orders of magnitude more expensive than ALU operations, and
dominates the required power.
Previously proposed 'Deep Compression' makes it possible to fit large DNNs
(AlexNet and VGGNet) fully in on-chip SRAM. This compression is achieved by
pruning the redundant connections and having multiple connections share the
same weight. We propose an energy efficient inference engine (EIE) that
performs inference on this compressed network model and accelerates the
resulting sparse matrix-vector multiplication with weight sharing. Going from
DRAM to SRAM gives EIE 120x energy saving; Exploiting sparsity saves 10x;
Weight sharing gives 8x; Skipping zero activations from ReLU saves another 3x.
Evaluated on nine DNN benchmarks, EIE is 189x and 13x faster when compared to
CPU and GPU implementations of the same DNN without compression. EIE has a
processing power of 102GOPS/s working directly on a compressed network,
corresponding to 3TOPS/s on an uncompressed network, and processes FC layers of
AlexNet at 1.88x10^4 frames/sec with a power dissipation of only 600mW. It is
24,000x and 3,400x more energy efficient than a CPU and GPU respectively.
Compared with DaDianNao, EIE has 2.9x, 19x and 3x better throughput, energy
efficiency and area efficiency.Comment: External Links: TheNextPlatform: http://goo.gl/f7qX0L ; O'Reilly:
https://goo.gl/Id1HNT ; Hacker News: https://goo.gl/KM72SV ; Embedded-vision:
http://goo.gl/joQNg8 ; Talk at NVIDIA GTC'16: http://goo.gl/6wJYvn ; Talk at
Embedded Vision Summit: https://goo.gl/7abFNe ; Talk at Stanford University:
https://goo.gl/6lwuer. Published as a conference paper in ISCA 201
NASA SBIR abstracts of 1990 phase 1 projects
The research objectives of the 280 projects placed under contract in the National Aeronautics and Space Administration (NASA) 1990 Small Business Innovation Research (SBIR) Phase 1 program are described. The basic document consists of edited, non-proprietary abstracts of the winning proposals submitted by small businesses in response to NASA's 1990 SBIR Phase 1 Program Solicitation. The abstracts are presented under the 15 technical topics within which Phase 1 proposals were solicited. Each project was assigned a sequential identifying number from 001 to 280, in order of its appearance in the body of the report. The document also includes Appendixes to provide additional information about the SBIR program and permit cross-reference in the 1990 Phase 1 projects by company name, location by state, principal investigator, NASA field center responsible for management of each project, and NASA contract number
Low power and high performance heterogeneous computing on FPGAs
L'abstract è presente nell'allegato / the abstract is in the attachmen
hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices
Accessible machine learning algorithms, software, and diagnostic tools for
energy-efficient devices and systems are extremely valuable across a broad
range of application domains. In scientific domains, real-time near-sensor
processing can drastically improve experimental design and accelerate
scientific discoveries. To support domain scientists, we have developed hls4ml,
an open-source software-hardware codesign workflow to interpret and translate
machine learning algorithms for implementation with both FPGA and ASIC
technologies. We expand on previous hls4ml work by extending capabilities and
techniques towards low-power implementations and increased usability: new
Python APIs, quantization-aware pruning, end-to-end FPGA workflows, long
pipeline kernels for low power, and new device backends include an ASIC
workflow. Taken together, these and continued efforts in hls4ml will arm a new
generation of domain scientists with accessible, efficient, and powerful tools
for machine-learning-accelerated discovery.Comment: 10 pages, 8 figures, TinyML Research Symposium 202
AI/ML Algorithms and Applications in VLSI Design and Technology
An evident challenge ahead for the integrated circuit (IC) industry in the
nanometer regime is the investigation and development of methods that can
reduce the design complexity ensuing from growing process variations and
curtail the turnaround time of chip manufacturing. Conventional methodologies
employed for such tasks are largely manual; thus, time-consuming and
resource-intensive. In contrast, the unique learning strategies of artificial
intelligence (AI) provide numerous exciting automated approaches for handling
complex and data-intensive tasks in very-large-scale integration (VLSI) design
and testing. Employing AI and machine learning (ML) algorithms in VLSI design
and manufacturing reduces the time and effort for understanding and processing
the data within and across different abstraction levels via automated learning
algorithms. It, in turn, improves the IC yield and reduces the manufacturing
turnaround time. This paper thoroughly reviews the AI/ML automated approaches
introduced in the past towards VLSI design and manufacturing. Moreover, we
discuss the scope of AI/ML applications in the future at various abstraction
levels to revolutionize the field of VLSI design, aiming for high-speed, highly
intelligent, and efficient implementations
Recommended from our members
ANALOG SIGNAL PROCESSING SOLUTIONS AND DESIGN OF MEMRISTOR-CMOS ANALOG CO-PROCESSOR FOR ACCELERATION OF HIGH-PERFORMANCE COMPUTING APPLICATIONS
Emerging applications in the field of machine vision, deep learning and scientific simulation require high computational speed and are run on platforms that are size, weight and power constrained. With the transistor scaling coming to an end, existing digital hardware architectures will not be able to meet these ever-increasing demands. Analog computation with its rich set of primitives and inherent parallel architecture can be faster, more efficient and compact for some of these applications. The major contribution of this work is to show that analog processing can be a viable solution to this problem. This is demonstrated in the three parts of the dissertation.
In the first part of the dissertation, we demonstrate that analog processing can be used to solve the problem of stereo correspondence. Novel modifications to the algorithms are proposed which improves the computational speed and makes them efficiently implementable in analog hardware. The analog domain implementation provides further speedup in computation and has lower power consumption than a digital implementation.
In the second part of the dissertation, a prototype of an analog processor was developed using commercially available off-the-shelf components. The focus was on providing experimental results that demonstrate functionality and to show that the performance of the prototype for low-level and mid-level image processing tasks is equivalent to a digital implementation. To demonstrate improvement in speed and power consumption, an integrated circuit design of the analog processor was proposed, and it was shown that such an analog processor would be faster than state-of-the-art digital and other analog processors.
In the third part of the dissertation, a memristor-CMOS analog co-processor that can perform floating point vector matrix multiplication (VMM) is proposed. VMM computation underlies some of the major applications. To demonstrate the working of the analog co-processor at a system level, a new tool called PSpice Systems Option is used. It is shown that the analog co-processor has a superior performance when compared to the projected performances of digital and analog processors. Using the new tool, various application simulations for image processing and solution to partial differential equations are performed on the co-processor model
A Comprehensive Workflow for General-Purpose Neural Modeling with Highly Configurable Neuromorphic Hardware Systems
In this paper we present a methodological framework that meets novel
requirements emerging from upcoming types of accelerated and highly
configurable neuromorphic hardware systems. We describe in detail a device with
45 million programmable and dynamic synapses that is currently under
development, and we sketch the conceptual challenges that arise from taking
this platform into operation. More specifically, we aim at the establishment of
this neuromorphic system as a flexible and neuroscientifically valuable
modeling tool that can be used by non-hardware-experts. We consider various
functional aspects to be crucial for this purpose, and we introduce a
consistent workflow with detailed descriptions of all involved modules that
implement the suggested steps: The integration of the hardware interface into
the simulator-independent model description language PyNN; a fully automated
translation between the PyNN domain and appropriate hardware configurations; an
executable specification of the future neuromorphic system that can be
seamlessly integrated into this biology-to-hardware mapping process as a test
bench for all software layers and possible hardware design modifications; an
evaluation scheme that deploys models from a dedicated benchmark library,
compares the results generated by virtual or prototype hardware devices with
reference software simulations and analyzes the differences. The integration of
these components into one hardware-software workflow provides an ecosystem for
ongoing preparative studies that support the hardware design process and
represents the basis for the maturity of the model-to-hardware mapping
software. The functionality and flexibility of the latter is proven with a
variety of experimental results
- …