Search CORE

2,084 research outputs found

Dynamic Power Management for Neuromorphic Many-Core Systems

Author: Cederstroem Love
Dixius Andreas
Ellguth Georg
Furber Steve
Garside Jim
Hartmann Stephan
Hoeppner Sebastian
Mayr Christian
Neumaerker Felix
Partzsch Johannes
Plana Luis
Schiefer Stefan
Scholze Stefan
Vogginger Bernhard
Yan Yexin
Publication venue
Publication date: 01/01/2019
Field of study

This work presents a dynamic power management architecture for neuromorphic many core systems such as SpiNNaker. A fast dynamic voltage and frequency scaling (DVFS) technique is presented which allows the processing elements (PE) to change their supply voltage and clock frequency individually and autonomously within less than 100 ns. This is employed by the neuromorphic simulation software flow, which defines the performance level (PL) of the PE based on the actual workload within each simulation cycle. A test chip in 28 nm SLP CMOS technology has been implemented. It includes 4 PEs which can be scaled from 0.7 V to 1.0 V with frequencies from 125 MHz to 500 MHz at three distinct PLs. By measurement of three neuromorphic benchmarks it is shown that the total PE power consumption can be reduced by 75%, with 80% baseline power reduction and a 50% reduction of energy per neuron and synapse computation, all while maintaining temporary peak system performance to achieve biological real-time operation of the system. A numerical model of this power management model is derived which allows DVFS architecture exploration for neuromorphics. The proposed technique is to be used for the second generation SpiNNaker neuromorphic many core system

arXiv.org e-Print Archive

The University of Manchester - Institutional Repository

Motion estimation and CABAC VLSI co-processors for real-time high-quality H.264/AVC video coding

Author: Casula M.
Fanucci L.
Martina Maurizio
Masera Guido
Saponara S.
Publication venue: Elsevier
Publication date: 01/01/2010
Field of study

Real-time and high-quality video coding is gaining a wide interest in the research and industrial community for different applications. H.264/AVC, a recent standard for high performance video coding, can be successfully exploited in several scenarios including digital video broadcasting, high-definition TV and DVD-based systems, which require to sustain up to tens of Mbits/s. To that purpose this paper proposes optimized architectures for H.264/AVC most critical tasks, Motion estimation and context adaptive binary arithmetic coding. Post synthesis results on sub-micron CMOS standard-cells technologies show that the proposed architectures can actually process in real-time 720 × 480 video sequences at 30 frames/s and grant more than 50 Mbits/s. The achieved circuit complexity and power consumption budgets are suitable for their integration in complex VLSI multimedia systems based either on AHB bus centric on-chip communication system or on novel Network-on-Chip (NoC) infrastructures for MPSoC (Multi-Processor System on Chip

Archivio della Ricerca - Università di Pisa

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Scalable and Low Power LDPC Decoder Design Using High Level Algorithmic Synthesis

Author: Cavallaro Joseph R.
Ly Tai
Sun Yang
Publication venue: IEEE
Publication date: 01/01/2009
Field of study

This paper presents a scalable and low power low-density parity-check (LDPC) decoder design for the next generation wireless handset SoC. The methodology is based on high level synthesis: PICO (program-in chip-out) tool was used to produce efficient RTL directly from a sequential untimed C algorithm. We propose two parallel LDPC decoder architectures: (1) per-layer decoding architecture with scalable parallelism, and (2) multi-layer pipelined decoding architecture to achieve higher throughput. Based on the PICO technology, we have implemented a two-layer pipelined decoder on a TSMC 65nm 0.9V 8-metal layer CMOS technology with a core area of 1.2 mm2. The maximum achievable throughput is 415 Mbps when operating at 400 MHz clock frequency and the estimated peak power consumption is 180 mW.NokiaNokia Siemens Networks (NSN)XilinxNational Science Foundatio

CiteSeerX

DSpace at Rice University

NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps

Author: Aimar Alessandro
Calabrese Enrico
Corradi Federico
Delbruck Tobi
Linares-Barranco Alejandro
Liu Shih-Chii
Lungu Iulia-Alexandra
Milde Moritz B.
Mostafa Hesham
Rios-Navarro Antonio
Tapiador-Morales Ricardo
Publication venue
Publication date: 01/01/2017
Field of study

Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving many state-of-the-art (SOA) visual processing tasks. Even though Graphical Processing Units (GPUs) are most often used in training and deploying CNNs, their power efficiency is less than 10 GOp/s/W for single-frame runtime inference. We propose a flexible and efficient CNN accelerator architecture called NullHop that implements SOA CNNs useful for low-power and low-latency application scenarios. NullHop exploits the sparsity of neuron activations in CNNs to accelerate the computation and reduce memory requirements. The flexible architecture allows high utilization of available computing resources across kernel sizes ranging from 1x1 to 7x7. NullHop can process up to 128 input and 128 output feature maps per layer in a single pass. We implemented the proposed architecture on a Xilinx Zynq FPGA platform and present results showing how our implementation reduces external memory transfers and compute time in five different CNNs ranging from small ones up to the widely known large VGG16 and VGG19 CNNs. Post-synthesis simulations using Mentor Modelsim in a 28nm process with a clock frequency of 500 MHz show that the VGG19 network achieves over 450 GOp/s. By exploiting sparsity, NullHop achieves an efficiency of 368%, maintains over 98% utilization of the MAC units, and achieves a power efficiency of over 3TOp/s/W in a core area of 6.3mm

^2

. As further proof of NullHop's usability, we interfaced its FPGA implementation with a neuromorphic event camera for real time interactive demonstrations

arXiv.org e-Print Archive

ZORA

Western Sydney ResearchDirect

idUS. Depósito de Investigación Universidad de Sevilla

A 6 mW, 5,000-Word Real-Time Speech Recognizer Using WFST Models

Author: Chandrakasan Anantha P.
Glass James R.
Price Michael R.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2014
Field of study

We describe an IC that provides a local speech recognition capability for a variety of electronic devices. We start with a generic speech decoder architecture that is programmable with industry-standard WFST and GMM speech models. Algorithm and architectural enhancements are incorporated in order to achieve real-time performance amid system-level constraints on internal memory size and external memory bandwidth. A 2.5 × 2.5 mm test chip implementing this architecture was fabricated using a 65 nm process. The chip performs a 5,000 word recognition task in real-time with 13.0% word error rate, 6.0 mW core power consumption, and a search efficiency of approximately 16 nJ per hypothesis.Quanta Computer (Firm)Irwin Mark Jacobs and Joan Klein Jacobs Presidential Fellowshi

DSpace@MIT

Crossref

Evaluation of commercial ADC radiation tolerance for accelerator experiments

Author: Chen Hucheng
Chen Kai
Hu Xueye
Kierstead James
Lanni Francesco
Mead Joseph
Minelli Marena
Rescia Sergio
Takai Helio
Xu Hao
Publication venue: 'IOP Publishing'
Publication date: 08/05/2015
Field of study

Electronic components used in high energy physics experiments are subjected to a radiation background composed of high energy hadrons, mesons and photons. These particles can induce permanent and transient effects that affect the normal device operation. Ionizing dose and displacement damage can cause chronic damage which disable the device permanently. Transient effects or single event effects are in general recoverable with time intervals that depend on the nature of the failure. The magnitude of these effects is technology dependent with feature size being one of the key parameters. Analog to digital converters are components that are frequently used in detector front end electronics, generally placed as close as possible to the sensing elements to maximize signal fidelity. We report on radiation effects tests conducted on 17 commercially available analog to digital converters and extensive single event effect measurements on specific twelve and fourteen bit ADCs that presented high tolerance to ionizing dose. Mitigation strategies for single event effects (SEE) are discussed for their use in the large hadron collider environment.Comment: 16 pages, 8 figure

arXiv.org e-Print Archive

CiteSeerX