40,394 research outputs found
Spin-Based Neuron Model with Domain Wall Magnets as Synapse
We present artificial neural network design using spin devices that achieves
ultra low voltage operation, low power consumption, high speed, and high
integration density. We employ spin torque switched nano-magnets for modelling
neuron and domain wall magnets for compact, programmable synapses. The spin
based neuron-synapse units operate locally at ultra low supply voltage of 30mV
resulting in low computation power. CMOS based inter-neuron communication is
employed to realize network-level functionality. We corroborate circuit
operation with physics based models developed for the spin devices. Simulation
results for character recognition as a benchmark application shows 95% lower
power consumption as compared to 45nm CMOS design
Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions
In the past decade, Convolutional Neural Networks (CNNs) have demonstrated
state-of-the-art performance in various Artificial Intelligence tasks. To
accelerate the experimentation and development of CNNs, several software
frameworks have been released, primarily targeting power-hungry CPUs and GPUs.
In this context, reconfigurable hardware in the form of FPGAs constitutes a
potential alternative platform that can be integrated in the existing deep
learning ecosystem to provide a tunable balance between performance, power
consumption and programmability. In this paper, a survey of the existing
CNN-to-FPGA toolflows is presented, comprising a comparative study of their key
characteristics which include the supported applications, architectural
choices, design space exploration methods and achieved performance. Moreover,
major challenges and objectives introduced by the latest trends in CNN
algorithmic research are identified and presented. Finally, a uniform
evaluation methodology is proposed, aiming at the comprehensive, complete and
in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal,
201
PGPG: An Automatic Generator of Pipeline Design for Programmable GRAPE Systems
We have developed PGPG (Pipeline Generator for Programmable GRAPE), a
software which generates the low-level design of the pipeline processor and
communication software for FPGA-based computing engines (FBCEs). An FBCE
typically consists of one or multiple FPGA (Field-Programmable Gate Array)
chips and local memory. Here, the term "Field-Programmable" means that one can
rewrite the logic implemented to the chip after the hardware is completed, and
therefore a single FBCE can be used for calculation of various functions, for
example pipeline processors for gravity, SPH interaction, or image processing.
The main problem with FBCEs is that the user need to develop the detailed
hardware design for the processor to be implemented to FPGA chips. In addition,
she or he has to write the control logic for the processor, communication and
data conversion library on the host processor, and application program which
uses the developed processor. These require detailed knowledge of hardware
design, a hardware description language such as VHDL, the operating system and
the application, and amount of human work is huge. A relatively simple design
would require 1 person-year or more. The PGPG software generates all necessary
design descriptions, except for the application software itself, from a
high-level design description of the pipeline processor in the PGPG language.
The PGPG language is a simple language, specialized to the description of
pipeline processors. Thus, the design of pipeline processor in PGPG language is
much easier than the traditional design. For real applications such as the
pipeline for gravitational interaction, the pipeline processor generated by
PGPG achieved the performance similar to that of hand-written code. In this
paper we present a detailed description of PGPG version 1.0.Comment: 24 pages, 6 figures, accepted PASJ 2005 July 2
A Modular Programmable CMOS Analog Fuzzy Controller Chip
We present a highly modular fuzzy inference analog CMOS chip architecture with on-chip digital programmability. This chip consists of the interconnection of parameterized instances of two different kind of blocks, namely label blocks and rule blocks. The architecture realizes a lattice partition of the universe of discourse, which at the hardware level means that the fuzzy labels associated to every input (realized by the label blocks) are shared among the rule blocks. This reduces the area and power consumption and is the key point for chip modularity. The proposed architecture is demonstrated through a 16-rule two input CMOS 1-μm prototype which features an operation speed of 2.5 Mflips (2.5×10^6 fuzzy inferences per second) with 8.6 mW power consumption. Core area occupation of this prototype is of only 1.6 mm 2 including the digital control and memory circuitry used for programmability. Because of the architecture modularity the number of inputs and rules can be increased with any hardly design effort.This work was
supported in part by the Spanish C.I.C.Y.T under Contract TIC96-1392-C02-
02 (SIVA)
Empowering parallel computing with field programmable gate arrays
After more than 30 years, reconfigurable computing has grown from a concept to a mature field of science and technology. The cornerstone of this evolution is the field programmable gate array, a building block enabling the configuration of a custom hardware architecture. The departure from static von Neumannlike architectures opens the way to eliminate the instruction overhead and to optimize the execution speed and power consumption. FPGAs now live in a growing ecosystem of development tools, enabling software programmers to map algorithms directly onto hardware. Applications abound in many directions, including data centers, IoT, AI, image processing and space exploration. The increasing success of FPGAs is largely due to an improved toolchain with solid high-level synthesis support as well as a better integration with processor and memory systems. On the other hand, long compile times and complex design exploration remain areas for improvement. In this paper we address the evolution of FPGAs towards advanced multi-functional accelerators, discuss different programming models and their HLS language implementations, as well as high-performance tuning of FPGAs integrated into a heterogeneous platform. We pinpoint fallacies and pitfalls, and identify opportunities for language enhancements and architectural refinements
- …