Search CORE

829 research outputs found

AXTAR: Mission Design Concept

Author: Baysinger Michael
Briggs Michael S.
Capizzo Peter
Chakrabarty Deepto
De Geronimo Gianluigi
Fabisinski Leo
Gwon Chul S.
Hopkins Randall C.
Hornsby Linda S.
Johnson Les
Levine Alan M.
Maples C. Dauphne
Miernik Janie H.
Phlips Bernard F.
Ray Paul S.
Remillard Ronald A.
Strohmayer Tod E.
Thomas Dan
Wilson-Hodge Colleen A.
Wolff Michael T.
Wood Kent S.
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 01/06/2010
Field of study

The Advanced X-ray Timing Array (AXTAR) is a mission concept for X-ray timing of compact objects that combines very large collecting area, broadband spectral coverage, high time resolution, highly flexible scheduling, and an ability to respond promptly to time-critical targets of opportunity. It is optimized for submillisecond timing of bright Galactic X-ray sources in order to study phenomena at the natural time scales of neutron star surfaces and black hole event horizons, thus probing the physics of ultradense matter, strongly curved spacetimes, and intense magnetic fields. AXTAR's main instrument, the Large Area Timing Array (LATA) is a collimated instrument with 2-50 keV coverage and over 3 square meters effective area. The LATA is made up of an array of supermodules that house 2-mm thick silicon pixel detectors. AXTAR will provide a significant improvement in effective area (a factor of 7 at 4 keV and a factor of 36 at 30 keV) over the RXTE PCA. AXTAR will also carry a sensitive Sky Monitor (SM) that acts as a trigger for pointed observations of X-ray transients in addition to providing high duty cycle monitoring of the X-ray sky. We review the science goals and technical concept for AXTAR and present results from a preliminary mission design study.Comment: 19 pages, 10 figures, to be published in Space Telescopes and Instrumentation 2010: Ultraviolet to Gamma Ray, Proceedings of SPIE Volume 773

arXiv.org e-Print Archive

DSpace@MIT

Crossref

EIE: Efficient Inference Engine on Compressed Deep Neural Network

Author: Dally William J.
Han Song
Horowitz Mark A.
Liu Xingyu
Mao Huizi
Pedram Ardavan
Pu Jing
Publication venue
Publication date: 03/05/2016
Field of study

State-of-the-art deep neural networks (DNNs) have hundreds of millions of connections and are both computationally and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources and power budgets. While custom hardware helps the computation, fetching weights from DRAM is two orders of magnitude more expensive than ALU operations, and dominates the required power. Previously proposed 'Deep Compression' makes it possible to fit large DNNs (AlexNet and VGGNet) fully in on-chip SRAM. This compression is achieved by pruning the redundant connections and having multiple connections share the same weight. We propose an energy efficient inference engine (EIE) that performs inference on this compressed network model and accelerates the resulting sparse matrix-vector multiplication with weight sharing. Going from DRAM to SRAM gives EIE 120x energy saving; Exploiting sparsity saves 10x; Weight sharing gives 8x; Skipping zero activations from ReLU saves another 3x. Evaluated on nine DNN benchmarks, EIE is 189x and 13x faster when compared to CPU and GPU implementations of the same DNN without compression. EIE has a processing power of 102GOPS/s working directly on a compressed network, corresponding to 3TOPS/s on an uncompressed network, and processes FC layers of AlexNet at 1.88x10^4 frames/sec with a power dissipation of only 600mW. It is 24,000x and 3,400x more energy efficient than a CPU and GPU respectively. Compared with DaDianNao, EIE has 2.9x, 19x and 3x better throughput, energy efficiency and area efficiency.Comment: External Links: TheNextPlatform: http://goo.gl/f7qX0L ; O'Reilly: https://goo.gl/Id1HNT ; Hacker News: https://goo.gl/KM72SV ; Embedded-vision: http://goo.gl/joQNg8 ; Talk at NVIDIA GTC'16: http://goo.gl/6wJYvn ; Talk at Embedded Vision Summit: https://goo.gl/7abFNe ; Talk at Stanford University: https://goo.gl/6lwuer. Published as a conference paper in ISCA 201

arXiv.org e-Print Archive

Crossref

Infrastructure for Detector Research and Development towards the International Linear Collider

Author: Abramowicz H.
Aguilar J.
Alozy J.
Ambalathankandy P.
Andricek L.
Anduze M.
Aplin S.
Apostolakis J.
Aspell P.
Attie D.
Bachynska O.
Bailey D.S.
Bamberger A.
Bartsch V.
Bassignana D.
Behnke T.
Behr J.
Ben-Hamu Y.
Benyamna M.
Bergauer T.
Bergsma F.
Besson A.
Beyer E.
Bilevych Y.
Boisvert V.
Bonis J.
Bonnard J.
Bonnemaison A.
Boudry V.
Brezina Ch.
Brient J.C.
Bryngemark L.
Bulgheroni A.
Caccia M.
Calderone A.
Callier S.
Calvet D.
Campbell M.
Carballo V.M.Blanco
Carloganu C.
Cauchois A.
Charpy A.
Chefdeville M.
Christiansen P.
Claus G.
Clerc C.
Colas P.
Coppolani X.
Cornat R.
Cornebise P.
Corrin E.
Cotta-Ramusino A.
Cudie X.Llopart
Cussans D.G.
Cvach J.
Da Silva W.
Daniluk W.
David J.
de Freitas P.Mora
de Gaspari M.
de la Taille Ch.
De Lentdecker G.
de Masi R.
de Nooij L.
Degerli Y.
Dehmelt K.
Delagnes E.
Desch K.
Dewulf J.P.
Dhellot M.
Diener R.
Dolezal Z.
Doziere G.
Dragicevic M.
Drasal Z.
Dulinski W.
Dulucq F.
Dzahini D.
Eigen G.
Engels J.
Fehr F.
Fernandez M.
Fischer P.
Fiutowski T.
Fleury J.
Formenti F.
Fransen M.
Friedl M.
Frotin M.
Furletova J.
Gadow K.
Gaede F.
Garcia E.Garcia
Garutti E.
Gastaldi F.
Gay P.
Gelin M.
Ghislain P.
Giannelli M.Faucci
Giomataris I.
Giraud J.
Giudice P.A.
Goffe M.
Goodrick M.J.
Gottlicher P.
Green B.
Green M.G.
Grefe Ch.
Gregor I.M.
Grichine V.
Grondin D.
Gross P.
Guilhem G.
Haas D.
Haas T.
Haensel S.
Hartjes F.
Hauschild M.
Heath H.F.
Henschel H.
Himmi A.
Hommels L.B.A.
Hostachy J.Y.
Hu-Guo Ch.
Idzik M.
Imbault D.
Irmler C.
Ivantchenko V.
Janata M.
Janssen X.
Jaramillo R.
Jastrzab M.
Jauffret C.
Jeans D.
Jikhleb I.
Jonsson L.
Kalliopuska J.
Kaminski J.
Kananov S.
Kapusta F.
Karar A.
Kaukher A.
Kehrli A.
Kelly M.
Kielar E.
Kiesenhofer W.
Killenberg M.
Kloukinas K.
Kockner F.
Kodys P.
Koetz U.
Koffmane Ch.
Kohli M.
Kotula J.
Krammer M.
Krautscheid T.
Kruger H.
Kulis Sz.
Kvasnicka J.
Kvasnicka P.
Lange W.
Levy A.
Levy I.
Libov V.
Linssen L.
Ljunggren M.
Lohmann W.
Lozano M.
Lundberg B.
Lupberger M.
Lutz B.
Lutz P.
Mandry S.
Mannen S.
Marchioro A.
Marcisovsky M.
Martin-Chassard G.
Mathieu A.
Mehtaelae P.
Misiejuk A.
Mjornmark U.
Mnich J.
Morel F.
Morin L.
Moser H.G.
Moszczynski A.
Muhl C.
Munoz F.J.
Musa L.
Musat G.
Ninkovich J.
Ohlerich M.
Oliwa K.
Orava R.
Orsini F.
Oskarsson A.
Osterman L.
Page R.F.
Pawlik B.
Pellegrini G.
Peric I.
Pham T.Hung
Piemontese L.
Pohl M.
Polak I.
Poschl R.
Postranecky M.
Potylitsina-Kube N.
Prahl V.
Przyborowski D.
Quirion D.
Ratti L.
Raux L.
Re V.
Reinecke M.
Renz U.
Reuen L.
Rialot M.
Ribon A.
Richert T.
Richter R.
Roloff P.
Rosemann Ch.
Rouge A.
Royer L.
Ruan M.
Rubinski Igor
Rummel S.
Sadeh I.
Santos H.Franca
Savoy-Navarro A.
Schade P.
Schafer O.
Schroder H.
Schumacher M.
Schuwalov S.
Schwartz R.
Sefkow F.
Sefri R.
Seguin-Moreau N.
Senee F.
Shaw R.
Sicho P.
Smolik J.
Stenlund E.
Stern A.
Swientek K.
Terwort M.
Timmermans J.
Trampitsch G.
Traversi G.
Uzhinskiy V.
Valentan M.
Valin I.
van der Graaf H.
van Remortel N.
Vanel J.C.
Velthuis J.J.
Videau H.
Vila I.
Volkenborn R.
Vrba V.
Wang W.
Ward D.R.
Warren M.
Wicek F.
Wienemann P.
Wierba W.
Wing M.
Winter M.
Wu T.
Wurth R.
Yang Y.
Zalesak J.
Zarnecki A.F.
Zawiejski L.
Zimmermann R.
Zimmermann S.
Zwerger Andreas
Publication venue
Publication date: 23/01/2012
Field of study

The EUDET-project was launched to create an infrastructure for developing and testing new and advanced detector technologies to be used at a future linear collider. The aim was to make possible experimentation and analysis of data for institutes, which otherwise could not be realized due to lack of resources. The infrastructure comprised an analysis and software network, and instrumentation infrastructures for tracking detectors as well as for calorimetry.Comment: 54 pages, 48 picture

arXiv.org e-Print Archive

DESY Publication Database

DESY

CERN Document Server

Explore Bristol Research

A 64mW DNN-based Visual Navigation Engine for Autonomous Nano-Drones

Author: Benini Luca
Conti Francesco
Flamand Eric
Loquercio Antonio
Palossi Daniele
Scaramuzza Davide
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/08/2018
Field of study

Fully-autonomous miniaturized robots (e.g., drones), with artificial intelligence (AI) based visual navigation capabilities are extremely challenging drivers of Internet-of-Things edge intelligence capabilities. Visual navigation based on AI approaches, such as deep neural networks (DNNs) are becoming pervasive for standard-size drones, but are considered out of reach for nanodrones with size of a few cm

{}^\mathrm{2}

. In this work, we present the first (to the best of our knowledge) demonstration of a navigation engine for autonomous nano-drones capable of closed-loop end-to-end DNN-based visual navigation. To achieve this goal we developed a complete methodology for parallel execution of complex DNNs directly on-bard of resource-constrained milliwatt-scale nodes. Our system is based on GAP8, a novel parallel ultra-low-power computing platform, and a 27 g commercial, open-source CrazyFlie 2.0 nano-quadrotor. As part of our general methodology we discuss the software mapping techniques that enable the state-of-the-art deep convolutional neural network presented in [1] to be fully executed on-board within a strict 6 fps real-time constraint with no compromise in terms of flight results, while all processing is done with only 64 mW on average. Our navigation engine is flexible and can be used to span a wide performance range: at its peak performance corner it achieves 18 fps while still consuming on average just 3.5% of the power envelope of the deployed nano-aircraft.Comment: 15 pages, 13 figures, 5 tables, 2 listings, accepted for publication in the IEEE Internet of Things Journal (IEEE IOTJ

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

ZORA

XpulpNN: Enabling Energy Efficient and Flexible Inference of Quantized Neural Networks on RISC-V Based IoT End Nodes

Author: Benini L.
Conti F.
Garofalo A.
Rossi D.
Tagliavini G.
Publication venue
Publication date: 01/01/2021
Field of study

Heavily quantized fixed-point arithmetic is becoming a common approach to deploy Convolutional Neural Networks (CNNs) on limited-memory low-power IoT end-nodes. However, this trend is narrowed by the lack of support for low-bitwidth in the arithmetic units of state-of-the-art embedded Microcontrollers (MCUs). This work proposes a multi-precision arithmetic unit fully integrated into a RISC-V processor at the micro-architectural and ISA level to boost the efficiency of heavily Quantized Neural Network (QNN) inference on microcontroller-class cores. By extending the ISA with nibble (4-bit) and crumb (2-bit) SIMD instructions, we show near-linear speedup with respect to higher precision integer computation on the key kernels for QNN computation. Also, we propose a custom execution paradigm for SIMD sum-of-dot-product operations, which consists of fusing a dot product with a load operation, with an up to 1.64 × peak MAC/cycle improvement compared to a standard execution scenario. To further push the efficiency, we integrate the RISC-V extended core in a parallel cluster of 8 processors, with near-linear improvement with respect to a single core architecture. To evaluate the proposed extensions, we fully implement the cluster of processors in GF22FDX technology. QNN convolution kernels on a parallel cluster implementing the proposed extension run 6 × and 8 × faster when considering 4- and 2-bit data operands, respectively, compared to a baseline processing cluster only supporting 8-bit SIMD instructions. With a peak of 2.22 TOPs/s/W, the proposed solution achieves efficiency levels comparable with dedicated DNN inference accelerators and up to three orders of magnitude better than state-of-the-art ARM Cortex-M based microcontroller systems such as the low-end STM32L4 MCU and the high-end STM32H7 MCU

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

A power-saving modulation technique for time-of-flight range imaging sensors

Author: Conroy Richard M.
Cree Michael J.
Dorrington Adrian A.
Künnemeyer Rainer
Payne Andrew D.
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 01/01/2011
Field of study

Time-of-flight range imaging cameras measure distance and intensity simultaneously for every pixel in an image. With the continued advancement of the technology, a wide variety of new depth sensing applications are emerging; however a number of these potential applications have stringent electrical power constraints that are difficult to meet with the current state-of-the-art systems. Sensor gain modulation contributes a significant proportion of the total image sensor power consumption, and as higher spatial resolution range image sensors operating at higher modulation frequencies (to achieve better measurement precision) are developed, this proportion is likely to increase. The authors have developed a new sensor modulation technique using resonant circuit concepts that is more power efficient than the standard mode of operation. With a proof of principle system, a 93–96% reduction in modulation drive power was demonstrated across a range of modulation frequencies from 1–11 MHz. Finally, an evaluation of the range imaging performance revealed an improvement in measurement linearity in the resonant configuration due primarily to the more sinusoidal shape of the resonant electrical waveforms, while the average precision values were comparable between the standard and resonant operating modes

Crossref

Research Commons@Waikato

YodaNN: An Architecture for Ultra-Low Power Binary-Weight CNN Acceleration

Author: Andri Renzo
Benini Luca
Cavigelli Lukas
Rossi Davide
Publication venue
Publication date: 24/02/2017
Field of study

Convolutional neural networks (CNNs) have revolutionized the world of computer vision over the last few years, pushing image classification beyond human accuracy. The computational effort of today's CNNs requires power-hungry parallel processors or GP-GPUs. Recent developments in CNN accelerators for system-on-chip integration have reduced energy consumption significantly. Unfortunately, even these highly optimized devices are above the power envelope imposed by mobile and deeply embedded applications and face hard limitations caused by CNN weight I/O and storage. This prevents the adoption of CNNs in future ultra-low power Internet of Things end-nodes for near-sensor analytics. Recent algorithmic and theoretical advancements enable competitive classification accuracy even when limiting CNNs to binary (+1/-1) weights during training. These new findings bring major optimization opportunities in the arithmetic core by removing the need for expensive multiplications, as well as reducing I/O bandwidth and storage. In this work, we present an accelerator optimized for binary-weight CNNs that achieves 1510 GOp/s at 1.2 V on a core area of only 1.33 MGE (Million Gate Equivalent) or 0.19 mm

^2

and with a power dissipation of 895 {\mu}W in UMC 65 nm technology at 0.6 V. Our accelerator significantly outperforms the state-of-the-art in terms of energy and area efficiency achieving 61.2 TOp/s/[email protected] V and 1135 GOp/s/[email protected] V, respectively

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

The UA9 experimental layout

The UA9 experimental equipment was installed in the CERN-SPS in March '09 with the aim of investigating crystal assisted collimation in coasting mode. Its basic layout comprises silicon bent crystals acting as primary collimators mounted inside two vacuum vessels. A movable 60 cm long block of tungsten located downstream at about 90 degrees phase advance intercepts the deflected beam. Scintillators, Gas Electron Multiplier chambers and other beam loss monitors measure nuclear loss rates induced by the interaction of the beam halo in the crystal. Roman pots are installed in the path of the deflected particles and are equipped with a Medipix detector to reconstruct the transverse distribution of the impinging beam. Finally UA9 takes advantage of an LHC-collimator prototype installed close to the Roman pot to help in setting the beam conditions and to analyze the efficiency to deflect the beam. This paper describes in details the hardware installed to study the crystal collimation during 2010.Comment: 15pages, 11 figure, submitted to JINS

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Ferrara

CERN Document Server

Archivio della ricerca- Università di Roma La Sapienza

Archivio istituzionale della ricerca - Università di Padova