Search CORE

3,893 research outputs found

Ravel-XL: a hardware accelerator for assigned-delay compiled-code logic gate simulation

Author: Brown R. B.
Marques-Silva J. P.
Riepe M. A.
Sakallah K. A.
Publication venue
Publication date: 01/03/1996
Field of study

Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks

Author: Chandra Vikas
Chau Benson
Esmaeilzadeh Hadi
Kim Joon Kyung
Lai Liangzhen
Park Jongse
Sharma Hardik
Suda Naveen
Publication venue
Publication date: 30/05/2018
Field of study

Fully realizing the potential of acceleration for Deep Neural Networks (DNNs) requires understanding and leveraging algorithmic properties. This paper builds upon the algorithmic insight that bitwidth of operations in DNNs can be reduced without compromising their classification accuracy. However, to prevent accuracy loss, the bitwidth varies significantly across DNNs and it may even be adjusted for each layer. Thus, a fixed-bitwidth accelerator would either offer limited benefits to accommodate the worst-case bitwidth requirements, or lead to a degradation in final accuracy. To alleviate these deficiencies, this work introduces dynamic bit-level fusion/decomposition as a new dimension in the design of DNN accelerators. We explore this dimension by designing Bit Fusion, a bit-flexible accelerator, that constitutes an array of bit-level processing elements that dynamically fuse to match the bitwidth of individual DNN layers. This flexibility in the architecture enables minimizing the computation and the communication at the finest granularity possible with no loss in accuracy. We evaluate the benefits of BitFusion using eight real-world feed-forward and recurrent DNNs. The proposed microarchitecture is implemented in Verilog and synthesized in 45 nm technology. Using the synthesis results and cycle accurate simulation, we compare the benefits of Bit Fusion to two state-of-the-art DNN accelerators, Eyeriss and Stripes. In the same area, frequency, and process technology, BitFusion offers 3.9x speedup and 5.1x energy savings over Eyeriss. Compared to Stripes, BitFusion provides 2.6x speedup and 3.9x energy reduction at 45 nm node when BitFusion area and frequency are set to those of Stripes. Scaling to GPU technology node of 16 nm, BitFusion almost matches the performance of a 250-Watt Titan Xp, which uses 8-bit vector instructions, while BitFusion merely consumes 895 milliwatts of power

arXiv.org e-Print Archive

Crossref

Accelerated Financial Applications through Specialized Hardware, FPGA

Author: Dang Tri Quang
Rothermel John Mark
Publication venue: Digital WPI
Publication date: 13/12/2007
Field of study

This project will investigate Field Programmable Gate Array (FPGA) technology in financial applications. FPGA implementation in high performance computing is still in its infancy. Certain companies like XtremeData inc. advertized speed improvements of 50 to 1000 times for DNA sequencing using FPGAs, while using an FPGA as a coprocessor to handle specific tasks provides two to three times more processing power. FPGA technology increases performance by parallelizing calculations. This project will specifically address speed and accuracy improvements of both fundamental and transcendental functions when implemented using FPGA technology. The results of this project will lead to a series of recommendations for effective utilization of FPGA technology in financial applications

DigitalCommons@WPI

Scheduling for Large-Scale Systems

Author: Amestoy P.
Baleani M.
Brucker P.
Casanova H.
Garey M. R.
Kolettis N.
Lam C.-C.
Melhem R.
Pinedo M. L.
Prathipati R. B.
Puterman M. L.
Rayward-Smith V. J.
Sarkar V.
Shirazi B. A.
Uçar B.
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref

Study of Various Motherboards

Author: Vyas Bimal H.
Publication venue
Publication date: 01/01/2007
Field of study

Not availabl

Etheses - A Saurashtra University Library Service

CROSS-LAYER CUSTOMIZATION FOR LOW POWER AND HIGH PERFORMANCE EMBEDDED MULTI-CORE PROCESSORS

Author: Yu Chenjie
Publication venue
Publication date: 01/01/2010
Field of study

Due to physical limitations and design difficulties, computer processor architecture has shifted to multi-core and even many-core based approaches in recent years. Such architectures provide potentials for sustainable performance scaling into future peta-scale/exa-scale computing platforms, at affordable power budget, design complexity, and verification efforts. To date, multi-core processor products have been replacing uni-core processors in almost every market segment, including embedded systems, general-purpose desktops and laptops, and super computers. However, many issues still remain with multi-core processor architectures that need to be addressed before their potentials could be fully realized. People in both academia and industry research community are still seeking proper ways to make efficient and effective use of these processors. The issues involve hardware architecture trade-offs, the system software service, the run-time management, and user application design, which demand more research effort into this field. Due to the architectural specialties with multi-core based computers, a Cross-Layer Customization framework is proposed in this work, which combines application specific information and system platform features, along with necessary operating system service support, to achieve exceptional power and performance efficiency for targeted multi-core platforms. Several topics are covered with specific optimization goals, including snoop cache coherence protocol, inter-core communication for producer-consumer applications, synchronization mechanisms, and off-chip memory bandwidth limitations. Analysis of benchmark program execution with conventional mechanisms is made to reveal the overheads in terms of power and performance. Specific customizations are proposed to eliminate such overheads with support from hardware, system software, compiler, and user applications. Experiments show significant improvement on system performance and power efficiency

Digital Repository at the University of Maryland

Introductory Microcontroller Programming

Author: Alley Peter J
Publication venue: Digital WPI
Publication date: 28/04/2011
Field of study

This text is a treatise on microcontroller programming. It introduces the major peripherals found on most microcontrollers, including the usage of them, focusing on the ATmega644p in the AVR family produced by Atmel. General information and background knowledge on several topics is also presented. These topics include information regarding the hardware of a microcontroller and assembly code as well as instructions regarding good program structure and coding practices. Examples with code and discussion are presented throughout. This is intended for hobbyists and students desiring knowledge on programming microcontrollers, and is written at a level that students entering the junior level core robotics classes would find useful

DigitalCommons@WPI