829 research outputs found

    AXTAR: Mission Design Concept

    Full text link
    The Advanced X-ray Timing Array (AXTAR) is a mission concept for X-ray timing of compact objects that combines very large collecting area, broadband spectral coverage, high time resolution, highly flexible scheduling, and an ability to respond promptly to time-critical targets of opportunity. It is optimized for submillisecond timing of bright Galactic X-ray sources in order to study phenomena at the natural time scales of neutron star surfaces and black hole event horizons, thus probing the physics of ultradense matter, strongly curved spacetimes, and intense magnetic fields. AXTAR's main instrument, the Large Area Timing Array (LATA) is a collimated instrument with 2-50 keV coverage and over 3 square meters effective area. The LATA is made up of an array of supermodules that house 2-mm thick silicon pixel detectors. AXTAR will provide a significant improvement in effective area (a factor of 7 at 4 keV and a factor of 36 at 30 keV) over the RXTE PCA. AXTAR will also carry a sensitive Sky Monitor (SM) that acts as a trigger for pointed observations of X-ray transients in addition to providing high duty cycle monitoring of the X-ray sky. We review the science goals and technical concept for AXTAR and present results from a preliminary mission design study.Comment: 19 pages, 10 figures, to be published in Space Telescopes and Instrumentation 2010: Ultraviolet to Gamma Ray, Proceedings of SPIE Volume 773

    EIE: Efficient Inference Engine on Compressed Deep Neural Network

    Full text link
    State-of-the-art deep neural networks (DNNs) have hundreds of millions of connections and are both computationally and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources and power budgets. While custom hardware helps the computation, fetching weights from DRAM is two orders of magnitude more expensive than ALU operations, and dominates the required power. Previously proposed 'Deep Compression' makes it possible to fit large DNNs (AlexNet and VGGNet) fully in on-chip SRAM. This compression is achieved by pruning the redundant connections and having multiple connections share the same weight. We propose an energy efficient inference engine (EIE) that performs inference on this compressed network model and accelerates the resulting sparse matrix-vector multiplication with weight sharing. Going from DRAM to SRAM gives EIE 120x energy saving; Exploiting sparsity saves 10x; Weight sharing gives 8x; Skipping zero activations from ReLU saves another 3x. Evaluated on nine DNN benchmarks, EIE is 189x and 13x faster when compared to CPU and GPU implementations of the same DNN without compression. EIE has a processing power of 102GOPS/s working directly on a compressed network, corresponding to 3TOPS/s on an uncompressed network, and processes FC layers of AlexNet at 1.88x10^4 frames/sec with a power dissipation of only 600mW. It is 24,000x and 3,400x more energy efficient than a CPU and GPU respectively. Compared with DaDianNao, EIE has 2.9x, 19x and 3x better throughput, energy efficiency and area efficiency.Comment: External Links: TheNextPlatform: http://goo.gl/f7qX0L ; O'Reilly: https://goo.gl/Id1HNT ; Hacker News: https://goo.gl/KM72SV ; Embedded-vision: http://goo.gl/joQNg8 ; Talk at NVIDIA GTC'16: http://goo.gl/6wJYvn ; Talk at Embedded Vision Summit: https://goo.gl/7abFNe ; Talk at Stanford University: https://goo.gl/6lwuer. Published as a conference paper in ISCA 201

    Infrastructure for Detector Research and Development towards the International Linear Collider

    Full text link
    The EUDET-project was launched to create an infrastructure for developing and testing new and advanced detector technologies to be used at a future linear collider. The aim was to make possible experimentation and analysis of data for institutes, which otherwise could not be realized due to lack of resources. The infrastructure comprised an analysis and software network, and instrumentation infrastructures for tracking detectors as well as for calorimetry.Comment: 54 pages, 48 picture

    A 64mW DNN-based Visual Navigation Engine for Autonomous Nano-Drones

    Full text link
    Fully-autonomous miniaturized robots (e.g., drones), with artificial intelligence (AI) based visual navigation capabilities are extremely challenging drivers of Internet-of-Things edge intelligence capabilities. Visual navigation based on AI approaches, such as deep neural networks (DNNs) are becoming pervasive for standard-size drones, but are considered out of reach for nanodrones with size of a few cm2{}^\mathrm{2}. In this work, we present the first (to the best of our knowledge) demonstration of a navigation engine for autonomous nano-drones capable of closed-loop end-to-end DNN-based visual navigation. To achieve this goal we developed a complete methodology for parallel execution of complex DNNs directly on-bard of resource-constrained milliwatt-scale nodes. Our system is based on GAP8, a novel parallel ultra-low-power computing platform, and a 27 g commercial, open-source CrazyFlie 2.0 nano-quadrotor. As part of our general methodology we discuss the software mapping techniques that enable the state-of-the-art deep convolutional neural network presented in [1] to be fully executed on-board within a strict 6 fps real-time constraint with no compromise in terms of flight results, while all processing is done with only 64 mW on average. Our navigation engine is flexible and can be used to span a wide performance range: at its peak performance corner it achieves 18 fps while still consuming on average just 3.5% of the power envelope of the deployed nano-aircraft.Comment: 15 pages, 13 figures, 5 tables, 2 listings, accepted for publication in the IEEE Internet of Things Journal (IEEE IOTJ

    XpulpNN: Enabling Energy Efficient and Flexible Inference of Quantized Neural Networks on RISC-V Based IoT End Nodes

    Get PDF
    Heavily quantized fixed-point arithmetic is becoming a common approach to deploy Convolutional Neural Networks (CNNs) on limited-memory low-power IoT end-nodes. However, this trend is narrowed by the lack of support for low-bitwidth in the arithmetic units of state-of-the-art embedded Microcontrollers (MCUs). This work proposes a multi-precision arithmetic unit fully integrated into a RISC-V processor at the micro-architectural and ISA level to boost the efficiency of heavily Quantized Neural Network (QNN) inference on microcontroller-class cores. By extending the ISA with nibble (4-bit) and crumb (2-bit) SIMD instructions, we show near-linear speedup with respect to higher precision integer computation on the key kernels for QNN computation. Also, we propose a custom execution paradigm for SIMD sum-of-dot-product operations, which consists of fusing a dot product with a load operation, with an up to 1.64 × peak MAC/cycle improvement compared to a standard execution scenario. To further push the efficiency, we integrate the RISC-V extended core in a parallel cluster of 8 processors, with near-linear improvement with respect to a single core architecture. To evaluate the proposed extensions, we fully implement the cluster of processors in GF22FDX technology. QNN convolution kernels on a parallel cluster implementing the proposed extension run 6 × and 8 × faster when considering 4- and 2-bit data operands, respectively, compared to a baseline processing cluster only supporting 8-bit SIMD instructions. With a peak of 2.22 TOPs/s/W, the proposed solution achieves efficiency levels comparable with dedicated DNN inference accelerators and up to three orders of magnitude better than state-of-the-art ARM Cortex-M based microcontroller systems such as the low-end STM32L4 MCU and the high-end STM32H7 MCU

    A power-saving modulation technique for time-of-flight range imaging sensors

    Get PDF
    Time-of-flight range imaging cameras measure distance and intensity simultaneously for every pixel in an image. With the continued advancement of the technology, a wide variety of new depth sensing applications are emerging; however a number of these potential applications have stringent electrical power constraints that are difficult to meet with the current state-of-the-art systems. Sensor gain modulation contributes a significant proportion of the total image sensor power consumption, and as higher spatial resolution range image sensors operating at higher modulation frequencies (to achieve better measurement precision) are developed, this proportion is likely to increase. The authors have developed a new sensor modulation technique using resonant circuit concepts that is more power efficient than the standard mode of operation. With a proof of principle system, a 93–96% reduction in modulation drive power was demonstrated across a range of modulation frequencies from 1–11 MHz. Finally, an evaluation of the range imaging performance revealed an improvement in measurement linearity in the resonant configuration due primarily to the more sinusoidal shape of the resonant electrical waveforms, while the average precision values were comparable between the standard and resonant operating modes

    YodaNN: An Architecture for Ultra-Low Power Binary-Weight CNN Acceleration

    Get PDF
    Convolutional neural networks (CNNs) have revolutionized the world of computer vision over the last few years, pushing image classification beyond human accuracy. The computational effort of today's CNNs requires power-hungry parallel processors or GP-GPUs. Recent developments in CNN accelerators for system-on-chip integration have reduced energy consumption significantly. Unfortunately, even these highly optimized devices are above the power envelope imposed by mobile and deeply embedded applications and face hard limitations caused by CNN weight I/O and storage. This prevents the adoption of CNNs in future ultra-low power Internet of Things end-nodes for near-sensor analytics. Recent algorithmic and theoretical advancements enable competitive classification accuracy even when limiting CNNs to binary (+1/-1) weights during training. These new findings bring major optimization opportunities in the arithmetic core by removing the need for expensive multiplications, as well as reducing I/O bandwidth and storage. In this work, we present an accelerator optimized for binary-weight CNNs that achieves 1510 GOp/s at 1.2 V on a core area of only 1.33 MGE (Million Gate Equivalent) or 0.19 mm2^2 and with a power dissipation of 895 {\mu}W in UMC 65 nm technology at 0.6 V. Our accelerator significantly outperforms the state-of-the-art in terms of energy and area efficiency achieving 61.2 TOp/s/[email protected] V and 1135 GOp/s/[email protected] V, respectively

    The UA9 experimental layout

    Full text link
    The UA9 experimental equipment was installed in the CERN-SPS in March '09 with the aim of investigating crystal assisted collimation in coasting mode. Its basic layout comprises silicon bent crystals acting as primary collimators mounted inside two vacuum vessels. A movable 60 cm long block of tungsten located downstream at about 90 degrees phase advance intercepts the deflected beam. Scintillators, Gas Electron Multiplier chambers and other beam loss monitors measure nuclear loss rates induced by the interaction of the beam halo in the crystal. Roman pots are installed in the path of the deflected particles and are equipped with a Medipix detector to reconstruct the transverse distribution of the impinging beam. Finally UA9 takes advantage of an LHC-collimator prototype installed close to the Roman pot to help in setting the beam conditions and to analyze the efficiency to deflect the beam. This paper describes in details the hardware installed to study the crystal collimation during 2010.Comment: 15pages, 11 figure, submitted to JINS
    • 

    corecore