Search CORE

65,971 research outputs found

Energy-Efficient Neural Network Architectures

Author: Wu Hsi-Shou
Publication venue
Publication date: 01/01/2018
Field of study

Emerging systems for artificial intelligence (AI) are expected to rely on deep neural networks (DNNs) to achieve high accuracy for a broad variety of applications, including computer vision, robotics, and speech recognition. Due to the rapid growth of network size and depth, however, DNNs typically result in high computational costs and introduce considerable power and performance overheads. Dedicated chip architectures that implement DNNs with high energy efficiency are essential for adding intelligence to interactive edge devices, enabling them to complete increasingly sophisticated tasks by extending battery lie. They are also vital for improving performance in cloud servers that support demanding AI computations. This dissertation focuses on architectures and circuit technologies for designing energy-efficient neural network accelerators. First, a deep-learning processor is presented for achieving ultra-low power operation. Using a heterogeneous architecture that includes a low-power always-on front-end and a selectively-enabled high-performance back-end, the processor dynamically adjusts computational resources at runtime to support conditional execution in neural networks and meet performance targets with increased energy efficiency. Featuring a reconfigurable datapath and a memory architecture optimized for energy efficiency, the processor supports multilevel dynamic activation of neural network segments, performing object detection tasks with 5.3x lower energy consumption in comparison with a static execution baseline. Fabricated in 40nm CMOS, the processor test-chip dissipates 0.23mW at 5.3 fps. It demonstrates energy scalability up to 28.6 TOPS/W and can be configured to run a variety of workloads, including severely power-constrained ones such as always-on monitoring in mobile applications. To further improve the energy efficiency of the proposed heterogeneous architecture, a new charge-recovery logic family, called zero-short-circuit current (ZSCC) logic, is proposed to decrease the power consumption of the always-on front-end. By relying on dedicated circuit topologies and a four-phase clocking scheme, ZSCC operates with significantly reduced short-circuit currents, realizing order-of-magnitude power savings at relatively low clock frequencies (in the order of a few MHz). The efficiency and applicability of ZSCC is demonstrated through an ANSI S1.11 1/3 octave filter bank chip for binaural hearing aids with two microphones per ear. Fabricated in a 65nm CMOS process, this charge-recovery chip consumes 13.8µW with a 1.75MHz clock frequency, achieving 9.7x power reduction per input in comparison with a 40nm monophonic single-input chip that represents the published state of the art. The ability of ZSCC to further increase the energy efficiency of the heterogeneous neural network architecture is demonstrated through the design and evaluation of a ZSCC-based front-end. Simulation results show 17x power reduction compared with a conventional static CMOS implementation of the same architecture.PHDElectrical and Computer EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147614/1/hsiwu_1.pd

Deep Blue Documents at the University of Michigan

Comprehensive characterization of an open source document search engine

Author: Antoniou Georgia
Hadjilambrou Zacharias
Kleanthous Marios
Portero Antoni
Sazeides Yiannakis
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

This work performs a thorough characterization and analysis of the open source Lucene search library. The article describes in detail the architecture, functionality, and micro-architectural behavior of the search engine, and investigates prominent online document search research issues. In particular, we study how intra-server index partitioning affects the response time and throughput, explore the potential use of low power servers for document search, and examine the sources of performance degradation ands the causes of tail latencies. Some of our main conclusions are the following: (a) intra-server index partitioning can reduce tail latencies but with diminishing benefits as incoming query traffic increases, (b) low power servers given enough partitioning can provide same average and tail response times as conventional high performance servers, (c) index search is a CPU-intensive cache-friendly application, and (d) C-states are the main culprits for performance degradation in document search.Web of Science162art. no. 1

DSpace at VSB Technical University of Ostrava

DIA: A complexity-effective decoding architecture

Author: Falcón Samper Ayose Jesus
Ramírez Bellido Alejandro
Santana Jaria Oliverio J.
Valero Cortés Mateo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Fast instruction decoding is a true challenge for the design of CISC microprocessors implementing variable-length instructions. A well-known solution to overcome this problem is caching decoded instructions in a hardware buffer. Fetching already decoded instructions avoids the need for decoding them again, improving processor performance. However, introducing such special--purpose storage in the processor design involves an important increase in the fetch architecture complexity. In this paper, we propose a novel decoding architecture that reduces the fetch engine implementation cost. Instead of using a special-purpose hardware buffer, our proposal stores frequently decoded instructions in the memory hierarchy. The address where the decoded instructions are stored is kept in the branch prediction mechanism, enabling it to guide our decoding architecture. This makes it possible for the processor front end to fetch already decoded instructions from the memory instead of the original nondecoded instructions. Our results show that using our decoding architecture, a state-of-the-art superscalar processor achieves competitive performance improvements, while requiring less chip area and energy consumption in the fetch architecture than a hardware code caching mechanism.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

SAMI: Service-Based Arbitrated Multi-Tier Infrastructure for Mobile Cloud Computing

Author: Abolfazli Saeid
Gani Abdullah
Sanaei Zohreh
Shiraz Muhammad
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Mobile Cloud Computing (MCC) is the state-ofthe- art mobile computing technology aims to alleviate resource poverty of mobile devices. Recently, several approaches and techniques have been proposed to augment mobile devices by leveraging cloud computing. However, long-WAN latency and trust are still two major issues in MCC that hinder its vision. In this paper, we analyze MCC and discuss its issues. We leverage Service Oriented Architecture (SOA) to propose an arbitrated multi-tier infrastructure model named SAMI for MCC. Our architecture consists of three major layers, namely SOA, arbitrator, and infrastructure. The main strength of this architecture is in its multi-tier infrastructure layer which leverages infrastructures from three main sources of Clouds, Mobile Network Operators (MNOs), and MNOs' authorized dealers. On top of the infrastructure layer, an arbitrator layer is designed to classify Services and allocate them the suitable resources based on several metrics such as resource requirement, latency and security. Utilizing SAMI facilitate development and deployment of service-based platform-neutral mobile applications.Comment: 6 full pages, accepted for publication in IEEE MobiCC'12 conference, MobiCC 2012:IEEE Workshop on Mobile Cloud Computing, Beijing, Chin

arXiv.org e-Print Archive

Crossref

UM Digital Repository

Commissioning of the CMS High Level Trigger

The CMS experiment will collect data from the proton-proton collisions delivered by the Large Hadron Collider (LHC) at a centre-of-mass energy up to 14 TeV. The CMS trigger system is designed to cope with unprecedented luminosities and LHC bunch-crossing rates up to 40 MHz. The unique CMS trigger architecture only employs two trigger levels. The Level-1 trigger is implemented using custom electronics, while the High Level Trigger (HLT) is based on software algorithms running on a large cluster of commercial processors, the Event Filter Farm. We present the major functionalities of the CMS High Level Trigger system as of the starting of LHC beams operations in September 2008. The validation of the HLT system in the online environment with Monte Carlo simulated data and its commissioning during cosmic rays data taking campaigns are discussed in detail. We conclude with the description of the HLT operations with the first circulating LHC beams before the incident occurred the 19th September 2008

arXiv.org e-Print Archive

Crossref

The University of Manchester - Institutional Repository

CERN Document Server

Archivio istituzionale della ricerca - Università di Padova