829 research outputs found
AXTAR: Mission Design Concept
The Advanced X-ray Timing Array (AXTAR) is a mission concept for X-ray timing
of compact objects that combines very large collecting area, broadband spectral
coverage, high time resolution, highly flexible scheduling, and an ability to
respond promptly to time-critical targets of opportunity. It is optimized for
submillisecond timing of bright Galactic X-ray sources in order to study
phenomena at the natural time scales of neutron star surfaces and black hole
event horizons, thus probing the physics of ultradense matter, strongly curved
spacetimes, and intense magnetic fields. AXTAR's main instrument, the Large
Area Timing Array (LATA) is a collimated instrument with 2-50 keV coverage and
over 3 square meters effective area. The LATA is made up of an array of
supermodules that house 2-mm thick silicon pixel detectors. AXTAR will provide
a significant improvement in effective area (a factor of 7 at 4 keV and a
factor of 36 at 30 keV) over the RXTE PCA. AXTAR will also carry a sensitive
Sky Monitor (SM) that acts as a trigger for pointed observations of X-ray
transients in addition to providing high duty cycle monitoring of the X-ray
sky. We review the science goals and technical concept for AXTAR and present
results from a preliminary mission design study.Comment: 19 pages, 10 figures, to be published in Space Telescopes and
Instrumentation 2010: Ultraviolet to Gamma Ray, Proceedings of SPIE Volume
773
EIE: Efficient Inference Engine on Compressed Deep Neural Network
State-of-the-art deep neural networks (DNNs) have hundreds of millions of
connections and are both computationally and memory intensive, making them
difficult to deploy on embedded systems with limited hardware resources and
power budgets. While custom hardware helps the computation, fetching weights
from DRAM is two orders of magnitude more expensive than ALU operations, and
dominates the required power.
Previously proposed 'Deep Compression' makes it possible to fit large DNNs
(AlexNet and VGGNet) fully in on-chip SRAM. This compression is achieved by
pruning the redundant connections and having multiple connections share the
same weight. We propose an energy efficient inference engine (EIE) that
performs inference on this compressed network model and accelerates the
resulting sparse matrix-vector multiplication with weight sharing. Going from
DRAM to SRAM gives EIE 120x energy saving; Exploiting sparsity saves 10x;
Weight sharing gives 8x; Skipping zero activations from ReLU saves another 3x.
Evaluated on nine DNN benchmarks, EIE is 189x and 13x faster when compared to
CPU and GPU implementations of the same DNN without compression. EIE has a
processing power of 102GOPS/s working directly on a compressed network,
corresponding to 3TOPS/s on an uncompressed network, and processes FC layers of
AlexNet at 1.88x10^4 frames/sec with a power dissipation of only 600mW. It is
24,000x and 3,400x more energy efficient than a CPU and GPU respectively.
Compared with DaDianNao, EIE has 2.9x, 19x and 3x better throughput, energy
efficiency and area efficiency.Comment: External Links: TheNextPlatform: http://goo.gl/f7qX0L ; O'Reilly:
https://goo.gl/Id1HNT ; Hacker News: https://goo.gl/KM72SV ; Embedded-vision:
http://goo.gl/joQNg8 ; Talk at NVIDIA GTC'16: http://goo.gl/6wJYvn ; Talk at
Embedded Vision Summit: https://goo.gl/7abFNe ; Talk at Stanford University:
https://goo.gl/6lwuer. Published as a conference paper in ISCA 201
Infrastructure for Detector Research and Development towards the International Linear Collider
The EUDET-project was launched to create an infrastructure for developing and
testing new and advanced detector technologies to be used at a future linear
collider. The aim was to make possible experimentation and analysis of data for
institutes, which otherwise could not be realized due to lack of resources. The
infrastructure comprised an analysis and software network, and instrumentation
infrastructures for tracking detectors as well as for calorimetry.Comment: 54 pages, 48 picture
A 64mW DNN-based Visual Navigation Engine for Autonomous Nano-Drones
Fully-autonomous miniaturized robots (e.g., drones), with artificial
intelligence (AI) based visual navigation capabilities are extremely
challenging drivers of Internet-of-Things edge intelligence capabilities.
Visual navigation based on AI approaches, such as deep neural networks (DNNs)
are becoming pervasive for standard-size drones, but are considered out of
reach for nanodrones with size of a few cm. In this work, we
present the first (to the best of our knowledge) demonstration of a navigation
engine for autonomous nano-drones capable of closed-loop end-to-end DNN-based
visual navigation. To achieve this goal we developed a complete methodology for
parallel execution of complex DNNs directly on-bard of resource-constrained
milliwatt-scale nodes. Our system is based on GAP8, a novel parallel
ultra-low-power computing platform, and a 27 g commercial, open-source
CrazyFlie 2.0 nano-quadrotor. As part of our general methodology we discuss the
software mapping techniques that enable the state-of-the-art deep convolutional
neural network presented in [1] to be fully executed on-board within a strict 6
fps real-time constraint with no compromise in terms of flight results, while
all processing is done with only 64 mW on average. Our navigation engine is
flexible and can be used to span a wide performance range: at its peak
performance corner it achieves 18 fps while still consuming on average just
3.5% of the power envelope of the deployed nano-aircraft.Comment: 15 pages, 13 figures, 5 tables, 2 listings, accepted for publication
in the IEEE Internet of Things Journal (IEEE IOTJ
XpulpNN: Enabling Energy Efficient and Flexible Inference of Quantized Neural Networks on RISC-V Based IoT End Nodes
Heavily quantized fixed-point arithmetic is becoming a common approach to deploy Convolutional Neural Networks (CNNs) on limited-memory low-power IoT end-nodes. However, this trend is narrowed by the lack of support for low-bitwidth in the arithmetic units of state-of-the-art embedded Microcontrollers (MCUs). This work proposes a multi-precision arithmetic unit fully integrated into a RISC-V processor at the micro-architectural and ISA level to boost the efficiency of heavily Quantized Neural Network (QNN) inference on microcontroller-class cores. By extending the ISA with nibble (4-bit) and crumb (2-bit) SIMD instructions, we show near-linear speedup with respect to higher precision integer computation on the key kernels for QNN computation. Also, we propose a custom execution paradigm for SIMD sum-of-dot-product operations, which consists of fusing a dot product with a load operation, with an up to 1.64 Ă peak MAC/cycle improvement compared to a standard execution scenario. To further push the efficiency, we integrate the RISC-V extended core in a parallel cluster of 8 processors, with near-linear improvement with respect to a single core architecture. To evaluate the proposed extensions, we fully implement the cluster of processors in GF22FDX technology. QNN convolution kernels on a parallel cluster implementing the proposed extension run 6 Ă and 8 Ă faster when considering 4- and 2-bit data operands, respectively, compared to a baseline processing cluster only supporting 8-bit SIMD instructions. With a peak of 2.22 TOPs/s/W, the proposed solution achieves efficiency levels comparable with dedicated DNN inference accelerators and up to three orders of magnitude better than state-of-the-art ARM Cortex-M based microcontroller systems such as the low-end STM32L4 MCU and the high-end STM32H7 MCU
A power-saving modulation technique for time-of-flight range imaging sensors
Time-of-flight range imaging cameras measure distance and intensity simultaneously for every pixel in an image. With the continued advancement of the technology, a wide variety of new depth sensing applications are emerging; however a number of these potential applications have stringent electrical power constraints that are difficult to meet with the current state-of-the-art systems. Sensor gain modulation contributes a significant proportion of the total image sensor power consumption, and as higher spatial resolution range image sensors operating at higher modulation frequencies (to achieve better measurement precision) are developed, this proportion is likely to increase. The authors have developed a new sensor modulation technique using resonant circuit concepts that is more power efficient than the standard mode of operation. With a proof of principle system, a 93â96% reduction in modulation drive power was demonstrated across a range of modulation frequencies from 1â11 MHz. Finally, an evaluation of the range imaging performance revealed an improvement in measurement linearity in the resonant configuration due primarily to the more sinusoidal shape of the resonant electrical waveforms, while the average precision values were comparable between the standard and resonant operating modes
YodaNN: An Architecture for Ultra-Low Power Binary-Weight CNN Acceleration
Convolutional neural networks (CNNs) have revolutionized the world of
computer vision over the last few years, pushing image classification beyond
human accuracy. The computational effort of today's CNNs requires power-hungry
parallel processors or GP-GPUs. Recent developments in CNN accelerators for
system-on-chip integration have reduced energy consumption significantly.
Unfortunately, even these highly optimized devices are above the power envelope
imposed by mobile and deeply embedded applications and face hard limitations
caused by CNN weight I/O and storage. This prevents the adoption of CNNs in
future ultra-low power Internet of Things end-nodes for near-sensor analytics.
Recent algorithmic and theoretical advancements enable competitive
classification accuracy even when limiting CNNs to binary (+1/-1) weights
during training. These new findings bring major optimization opportunities in
the arithmetic core by removing the need for expensive multiplications, as well
as reducing I/O bandwidth and storage. In this work, we present an accelerator
optimized for binary-weight CNNs that achieves 1510 GOp/s at 1.2 V on a core
area of only 1.33 MGE (Million Gate Equivalent) or 0.19 mm and with a power
dissipation of 895 {\mu}W in UMC 65 nm technology at 0.6 V. Our accelerator
significantly outperforms the state-of-the-art in terms of energy and area
efficiency achieving 61.2 TOp/s/[email protected] V and 1135 GOp/s/[email protected] V, respectively
The UA9 experimental layout
The UA9 experimental equipment was installed in the CERN-SPS in March '09
with the aim of investigating crystal assisted collimation in coasting mode.
Its basic layout comprises silicon bent crystals acting as primary
collimators mounted inside two vacuum vessels. A movable 60 cm long block of
tungsten located downstream at about 90 degrees phase advance intercepts the
deflected beam.
Scintillators, Gas Electron Multiplier chambers and other beam loss monitors
measure nuclear loss rates induced by the interaction of the beam halo in the
crystal. Roman pots are installed in the path of the deflected particles and
are equipped with a Medipix detector to reconstruct the transverse distribution
of the impinging beam. Finally UA9 takes advantage of an LHC-collimator
prototype installed close to the Roman pot to help in setting the beam
conditions and to analyze the efficiency to deflect the beam. This paper
describes in details the hardware installed to study the crystal collimation
during 2010.Comment: 15pages, 11 figure, submitted to JINS
- âŠ