Search CORE

524 research outputs found

Maximizing CNN Accelerator Efficiency Through Resource Partitioning

Author: Alwani M.
Krizhevsky Alex
Li Huimin
van den Oord Aäron
Publication venue
Publication date: 12/04/2018
Field of study

Convolutional neural networks (CNNs) are revolutionizing machine learning, but they present significant computational challenges. Recently, many FPGA-based accelerators have been proposed to improve the performance and efficiency of CNNs. Current approaches construct a single processor that computes the CNN layers one at a time; the processor is optimized to maximize the throughput at which the collection of layers is computed. However, this approach leads to inefficient designs because the same processor structure is used to compute CNN layers of radically varying dimensions. We present a new CNN accelerator paradigm and an accompanying automated design methodology that partitions the available FPGA resources into multiple processors, each of which is tailored for a different subset of the CNN convolutional layers. Using the same FPGA resources as a single large processor, multiple smaller specialized processors increase computational efficiency and lead to a higher overall throughput. Our design methodology achieves 3.8x higher throughput than the state-of-the-art approach on evaluating the popular AlexNet CNN on a Xilinx Virtex-7 FPGA. For the more recent SqueezeNet and GoogLeNet, the speedups are 2.2x and 2.0x

arXiv.org e-Print Archive

Crossref

Considerations on Hardware are implementations of encryption algorithms

Author: Mang Ioan
Publication venue: Вид-во СумДУ
Publication date: 01/01/2004
Field of study

Electronic Sumy State University Institutional Repository

Strengthening measurements from the edges: application-level packet loss rate estimation

Author: Carlson R.
Dischinger M.
Juan Carlos De Martin
Michela Meo
Simone Basso
Sänchez M. A.
Publication venue: ACM New York, NY, USA
Publication date: 01/01/2013
Field of study

Network users know much less than ISPs, Internet exchanges and content providers about what happens inside the network. Consequently users cannot either easily detect network neutrality violations or readily exercise their market power by knowledgeably switching ISPs. This paper contributes to the ongoing efforts to empower users by proposing two models to estimate -- via application-level measurements -- a key network indicator, i.e., the packet loss rate (PLR) experienced by FTP-like TCP downloads. Controlled, testbed, and large-scale experiments show that the Inverse Mathis model is simpler and more consistent across the whole PLR range, but less accurate than the more advanced Likely Rexmit model for landline connections and moderate PL

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

GPU-Based Data Processing for 2-D Microwave Imaging on MAST

Author: BITTNER R.
CASTRO R.
DAVIS W. M.
EIDIETIS N. W.
FREETHY S. J.
FREETHY S. J.
GARELLI N.
HUANG B. K.
LUJAN P.
MONTEIRO E.
NAVARRO C. A.
NAYLOR G. A.
NICKOLLS J.
OWENS J. D.
PELL O.
SALMON N. A.
SHEVCHENKO V. F.
SHEVCHENKO V. F.
THOMAS D. A.
THOUTI K.
URBAN J.
VAN CITTERT P. H.
VERMIJ E.
WYNTERS E.
XU C.
YANG L.
YUE X.
ZERNIKE F.
Publication venue: 'American Nuclear Society'
Publication date: 07/04/2016
Field of study

The Synthetic Aperture Microwave Imaging (SAMI) diagnostic is a Mega Amp Spherical Tokamak (MAST) diagnostic based at Culham Centre for Fusion Energy. The acceleration of the SAMI diagnostic data-processing code by a graphics processing unit is presented, demonstrating acceleration of up to 60 times compared to the original IDL (Interactive Data Language) data-processing code. SAMI will now be capable of intershot processing allowing pseudo-real-time control so that adjustments and optimizations can be made between shots. Additionally, for the first time the analysis of many shots will be possible

Durham Research Online

Crossref

White Rose Research Online

Towards the Optimal Hardware Architecture for Computer Vision

Author: Alejandro Nieto
David López Vilarino
Víctor Brea Sánchez
Publication venue: 'IntechOpen'
Publication date: 23/03/2012
Field of study

IntechOpen

Minimum area, low cost fpga implementation of aes

Author: Bonadero Juan Carlos
Liberatori Mónica Cristina
Publication venue
Publication date: 01/01/2004
Field of study

The Rijndael cipher, designed by Joan Daemen and Vincent Rijmen and recently selected as the official Advanced Encryption Standard (AES) is well suited for hardware use. This implementation can be carried out through several trade-offs between area and speed. This paper presents an 8-bit FPGA implementation of the 128-bit block and 128 bit-key AES cipher. Selected FPGA Family is Altera Flex 10K. The cipher operates at 25 MHz and consumes 470 clock cycles for algorithm encryption, resulting in a throughput of 6.8 Mbps. The design target was optimisation of area and cost.Eje: IV - Workshop de procesamiento distribuido y paraleloRed de Universidades con Carreras en Informática (RedUNCI

The Level-0 Muon Trigger for the LHCb Experiment

Author: Aslanides E.
Cachemiche J. -P.
Cogan J.
Dinkespiler B.
Duval P. -Y.
Favard S.
Gac R. Le
Leroy O.
Liotard P. -L.
Marin F.
Menouni M.
Roche A.
Tsaregorodtsev A.
Publication venue: 'Elsevier BV'
Publication date: 25/02/2006
Field of study

A very compact architecture has been developed for the first level Muon Trigger of the LHCb experiment that processes 40 millions of proton-proton collisions per second. For each collision, it receives 3.2 kBytes of data and it finds straight tracks within a 1.2 microseconds latency. The trigger implementation is massively parallel, pipelined and fully synchronous with the LHC clock. It relies on 248 high density Field Programable Gate arrays and on the massive use of multigigabit serial link transceivers embedded inside FPGAs.Comment: 33 pages, 16 figures, submitted to NIM

arXiv.org e-Print Archive

Software-defined Design Space Exploration for an Efficient DNN Accelerator Architecture

Author: Che Shuai
Jha Niraj K.
Li Yingmin
Yu Ye
Zhang Weifeng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/01/2020
Field of study

Deep neural networks (DNNs) have been shown to outperform conventional machine learning algorithms across a wide range of applications, e.g., image recognition, object detection, robotics, and natural language processing. However, the high computational complexity of DNNs often necessitates extremely fast and efficient hardware. The problem gets worse as the size of neural networks grows exponentially. As a result, customized hardware accelerators have been developed to accelerate DNN processing without sacrificing model accuracy. However, previous accelerator design studies have not fully considered the characteristics of the target applications, which may lead to sub-optimal architecture designs. On the other hand, new DNN models have been developed for better accuracy, but their compatibility with the underlying hardware accelerator is often overlooked. In this article, we propose an application-driven framework for architectural design space exploration of DNN accelerators. This framework is based on a hardware analytical model of individual DNN operations. It models the accelerator design task as a multi-dimensional optimization problem. We demonstrate that it can be efficaciously used in application-driven accelerator architecture design. Given a target DNN, the framework can generate efficient accelerator design solutions with optimized performance and area. Furthermore, we explore the opportunity to use the framework for accelerator configuration optimization under simultaneous diverse DNN applications. The framework is also capable of improving neural network models to best fit the underlying hardware resources

arXiv.org e-Print Archive

Princeton University Open Access Repository