38,007 research outputs found

    NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps

    Get PDF
    Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving many state-of-the-art (SOA) visual processing tasks. Even though Graphical Processing Units (GPUs) are most often used in training and deploying CNNs, their power efficiency is less than 10 GOp/s/W for single-frame runtime inference. We propose a flexible and efficient CNN accelerator architecture called NullHop that implements SOA CNNs useful for low-power and low-latency application scenarios. NullHop exploits the sparsity of neuron activations in CNNs to accelerate the computation and reduce memory requirements. The flexible architecture allows high utilization of available computing resources across kernel sizes ranging from 1x1 to 7x7. NullHop can process up to 128 input and 128 output feature maps per layer in a single pass. We implemented the proposed architecture on a Xilinx Zynq FPGA platform and present results showing how our implementation reduces external memory transfers and compute time in five different CNNs ranging from small ones up to the widely known large VGG16 and VGG19 CNNs. Post-synthesis simulations using Mentor Modelsim in a 28nm process with a clock frequency of 500 MHz show that the VGG19 network achieves over 450 GOp/s. By exploiting sparsity, NullHop achieves an efficiency of 368%, maintains over 98% utilization of the MAC units, and achieves a power efficiency of over 3TOp/s/W in a core area of 6.3mm2^2. As further proof of NullHop's usability, we interfaced its FPGA implementation with a neuromorphic event camera for real time interactive demonstrations

    Survey of Inter-satellite Communication for Small Satellite Systems: Physical Layer to Network Layer View

    Get PDF
    Small satellite systems enable whole new class of missions for navigation, communications, remote sensing and scientific research for both civilian and military purposes. As individual spacecraft are limited by the size, mass and power constraints, mass-produced small satellites in large constellations or clusters could be useful in many science missions such as gravity mapping, tracking of forest fires, finding water resources, etc. Constellation of satellites provide improved spatial and temporal resolution of the target. Small satellite constellations contribute innovative applications by replacing a single asset with several very capable spacecraft which opens the door to new applications. With increasing levels of autonomy, there will be a need for remote communication networks to enable communication between spacecraft. These space based networks will need to configure and maintain dynamic routes, manage intermediate nodes, and reconfigure themselves to achieve mission objectives. Hence, inter-satellite communication is a key aspect when satellites fly in formation. In this paper, we present the various researches being conducted in the small satellite community for implementing inter-satellite communications based on the Open System Interconnection (OSI) model. This paper also reviews the various design parameters applicable to the first three layers of the OSI model, i.e., physical, data link and network layer. Based on the survey, we also present a comprehensive list of design parameters useful for achieving inter-satellite communications for multiple small satellite missions. Specific topics include proposed solutions for some of the challenges faced by small satellite systems, enabling operations using a network of small satellites, and some examples of small satellite missions involving formation flying aspects.Comment: 51 pages, 21 Figures, 11 Tables, accepted in IEEE Communications Surveys and Tutorial

    Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices

    Full text link
    A recent trend in DNN development is to extend the reach of deep learning applications to platforms that are more resource and energy constrained, e.g., mobile devices. These endeavors aim to reduce the DNN model size and improve the hardware processing efficiency, and have resulted in DNNs that are much more compact in their structures and/or have high data sparsity. These compact or sparse models are different from the traditional large ones in that there is much more variation in their layer shapes and sizes, and often require specialized hardware to exploit sparsity for performance improvement. Thus, many DNN accelerators designed for large DNNs do not perform well on these models. In this work, we present Eyeriss v2, a DNN accelerator architecture designed for running compact and sparse DNNs. To deal with the widely varying layer shapes and sizes, it introduces a highly flexible on-chip network, called hierarchical mesh, that can adapt to the different amounts of data reuse and bandwidth requirements of different data types, which improves the utilization of the computation resources. Furthermore, Eyeriss v2 can process sparse data directly in the compressed domain for both weights and activations, and therefore is able to improve both processing speed and energy efficiency with sparse models. Overall, with sparse MobileNet, Eyeriss v2 in a 65nm CMOS process achieves a throughput of 1470.6 inferences/sec and 2560.3 inferences/J at a batch size of 1, which is 12.6x faster and 2.5x more energy efficient than the original Eyeriss running MobileNet. We also present an analysis methodology called Eyexam that provides a systematic way of understanding the performance limits for DNN processors as a function of specific characteristics of the DNN model and accelerator design; it applies these characteristics as sequential steps to increasingly tighten the bound on the performance limits.Comment: accepted for publication in IEEE Journal on Emerging and Selected Topics in Circuits and Systems. This extended version on arXiv also includes Eyexam in the appendi

    Wi-PoS : a low-cost, open source ultra-wideband (UWB) hardware platform with long range sub-GHz backbone

    Get PDF
    Ultra-wideband (UWB) localization is one of the most promising approaches for indoor localization due to its accurate positioning capabilities, immunity against multipath fading, and excellent resilience against narrowband interference. However, UWB researchers are currently limited by the small amount of feasible open source hardware that is publicly available. We developed a new open source hardware platform, Wi-PoS, for precise UWB localization based on Decawave’s DW1000 UWB transceiver with several unique features: support of both long-range sub-GHz and 2.4 GHz back-end communication between nodes, flexible interfacing with external UWB antennas, and an easy implementation of the MAC layer with the Time-Annotated Instruction Set Computer (TAISC) framework. Both hardware and software are open source and all parameters of the UWB ranging can be adjusted, calibrated, and analyzed. This paper explains the main specifications of the hardware platform, illustrates design decisions, and evaluates the performance of the board in terms of range, accuracy, and energy consumption. The accuracy of the ranging system was below 10 cm in an indoor lab environment at distances up to 5 m, and accuracy smaller than 5 cm was obtained at 50 and 75 m in an outdoor environment. A theoretical model was derived for predicting the path loss and the influence of the most important ground reflection. At the same time, the average energy consumption of the hardware was very low with only 81 mA for a tag node and 63 mA for the active anchor nodes, permitting the system to run for several days on a mobile battery pack and allowing easy and fast deployment on sites without an accessible power supply or backbone network. The UWB hardware platform demonstrated flexibility, easy installation, and low power consumption

    Fly-By-Wireless for Next Generation Aircraft: Challenges and Potential solutions

    Get PDF
    ”Fly-By-Wireless” paradigm based on wireless connectivity in aircraft has the potential to improve efficiency and flexibility, while reducing weight, fuel consumption and maintenance costs. In this paper, first, the opportunities and challenges for wireless technologies in safety-critical avionics context are discussed. Then, the assessment of such technologies versus avionics requirements is provided in order to select the most appropriate one for a wireless aircraft application. As a result, the design of a Wireless Avionics Network based on Ultra WideBand technology is investigated, considering the issues of determinism, reliability and security

    Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks

    Full text link
    Fully realizing the potential of acceleration for Deep Neural Networks (DNNs) requires understanding and leveraging algorithmic properties. This paper builds upon the algorithmic insight that bitwidth of operations in DNNs can be reduced without compromising their classification accuracy. However, to prevent accuracy loss, the bitwidth varies significantly across DNNs and it may even be adjusted for each layer. Thus, a fixed-bitwidth accelerator would either offer limited benefits to accommodate the worst-case bitwidth requirements, or lead to a degradation in final accuracy. To alleviate these deficiencies, this work introduces dynamic bit-level fusion/decomposition as a new dimension in the design of DNN accelerators. We explore this dimension by designing Bit Fusion, a bit-flexible accelerator, that constitutes an array of bit-level processing elements that dynamically fuse to match the bitwidth of individual DNN layers. This flexibility in the architecture enables minimizing the computation and the communication at the finest granularity possible with no loss in accuracy. We evaluate the benefits of BitFusion using eight real-world feed-forward and recurrent DNNs. The proposed microarchitecture is implemented in Verilog and synthesized in 45 nm technology. Using the synthesis results and cycle accurate simulation, we compare the benefits of Bit Fusion to two state-of-the-art DNN accelerators, Eyeriss and Stripes. In the same area, frequency, and process technology, BitFusion offers 3.9x speedup and 5.1x energy savings over Eyeriss. Compared to Stripes, BitFusion provides 2.6x speedup and 3.9x energy reduction at 45 nm node when BitFusion area and frequency are set to those of Stripes. Scaling to GPU technology node of 16 nm, BitFusion almost matches the performance of a 250-Watt Titan Xp, which uses 8-bit vector instructions, while BitFusion merely consumes 895 milliwatts of power

    Implementation of RTOS to the WSN node

    Get PDF
    Bezdrátové senzorické sieťe zväčša používajú `event-driven` operačné systémy. Táto práca diskutuje výhody nevýhody použitia RTOS v bezdrátových senzorických sieťach. Najvhodnejší RTOS je vybratý a sú podniknuté všetky kroky aby bolo možne demonštrovať schopnosť mikrokontrolérov Gecko od EnergyMicro prevádzkovať tento RTOS s nízkou spotrebou energie a demonštrovať jednoduchú bezdrátovú komunikáciu s Atmel AT86RF212 rádiami.Wireless sensors networks mostly use event-driven OSes. This works discusses pros and cons of using RTOS in wirless sensors networks. A most appropriate RTOS is chosen and all necessary steps are undergone to demonstrate EnergyMicro Gecko MCU's ability to run the RTOS with low energy consumption and demonstrate wireless simple communication with Atmel AT86RF212 radios.
    corecore