11,031 research outputs found

    Recent Advances in Efficient Computation of Deep Convolutional Neural Networks

    Full text link
    Deep neural networks have evolved remarkably over the past few years and they are currently the fundamental tools of many intelligent systems. At the same time, the computational complexity and resource consumption of these networks also continue to increase. This will pose a significant challenge to the deployment of such networks, especially in real-time applications or on resource-limited devices. Thus, network acceleration has become a hot topic within the deep learning community. As for hardware implementation of deep neural networks, a batch of accelerators based on FPGA/ASIC have been proposed in recent years. In this paper, we provide a comprehensive survey of recent advances in network acceleration, compression and accelerator design from both algorithm and hardware points of view. Specifically, we provide a thorough analysis of each of the following topics: network pruning, low-rank approximation, network quantization, teacher-student networks, compact network design and hardware accelerators. Finally, we will introduce and discuss a few possible future directions.Comment: 14 pages, 3 figure

    Event-triggered Natural Hazard Monitoring with Convolutional Neural Networks on the Edge

    Full text link
    In natural hazard warning systems fast decision making is vital to avoid catastrophes. Decision making at the edge of a wireless sensor network promises fast response times but is limited by the availability of energy, data transfer speed, processing and memory constraints. In this work we present a realization of a wireless sensor network for hazard monitoring based on an array of event-triggered single-channel micro-seismic sensors with advanced signal processing and characterization capabilities based on a novel co-detection technique. On the one hand we leverage an ultra-low power, threshold-triggering circuit paired with on-demand digital signal acquisition capable of extracting relevant information exactly and efficiently at times when it matters most and consequentially not wasting precious resources when nothing can be observed. On the other hand we utilize machine-learning-based classification implemented on low-power, off-the-shelf microcontrollers to avoid false positive warnings and to actively identify humans in hazard zones. The sensors' response time and memory requirement is substantially improved by quantizing and pipelining the inference of a convolutional neural network. In this way, convolutional neural networks that would not run unmodified on a memory constrained device can be executed in real-time and at scale on low-power embedded devices. A field study with our system is running on the rockfall scarp of the Matterhorn H\"ornligrat at 3500 m a.s.l. since 08/2018

    FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks

    Full text link
    Convolutional Neural Networks have rapidly become the most successful machine learning algorithm, enabling ubiquitous machine vision and intelligent decisions on even embedded computing-systems. While the underlying arithmetic is structurally simple, compute and memory requirements are challenging. One of the promising opportunities is leveraging reduced-precision representations for inputs, activations and model parameters. The resulting scalability in performance, power efficiency and storage footprint provides interesting design compromises in exchange for a small reduction in accuracy. FPGAs are ideal for exploiting low-precision inference engines leveraging custom precisions to achieve the required numerical accuracy for a given application. In this article, we describe the second generation of the FINN framework, an end-to-end tool which enables design space exploration and automates the creation of fully customized inference engines on FPGAs. Given a neural network description, the tool optimizes for given platforms, design targets and a specific precision. We introduce formalizations of resource cost functions and performance predictions, and elaborate on the optimization algorithms. Finally, we evaluate a selection of reduced precision neural networks ranging from CIFAR-10 classifiers to YOLO-based object detection on a range of platforms including PYNQ and AWS\,F1, demonstrating new unprecedented measured throughput at 50TOp/s on AWS-F1 and 5TOp/s on embedded devices.Comment: to be published in ACM TRETS Special Edition on Deep Learnin

    A deep learning based solution for construction equipment detection: from development to deployment

    Full text link
    This paper aims at providing researchers and engineering professionals with a practical and comprehensive deep learning based solution to detect construction equipment from the very first step of its development to the last one which is deployment. This paper focuses on the last step of deployment. The first phase of solution development, involved data preparation, model selection, model training, and model evaluation. The second phase of the study comprises of model optimization, application specific embedded system selection, and economic analysis. Several embedded systems were proposed and compared. The review of the results confirms superior real-time performance of the solutions with a consistent above 90% rate of accuracy. The current study validates the practicality of deep learning based object detection solutions for construction scenarios. Moreover, the detailed knowledge, presented in this study, can be employed for several purposes such as, safety monitoring, productivity assessments, and managerial decisions.Comment: 17 pages, 16 figures, 6 table

    NetScore: Towards Universal Metrics for Large-scale Performance Analysis of Deep Neural Networks for Practical On-Device Edge Usage

    Full text link
    Much of the focus in the design of deep neural networks has been on improving accuracy, leading to more powerful yet highly complex network architectures that are difficult to deploy in practical scenarios, particularly on edge devices such as mobile and other consumer devices given their high computational and memory requirements. As a result, there has been a recent interest in the design of quantitative metrics for evaluating deep neural networks that accounts for more than just model accuracy as the sole indicator of network performance. In this study, we continue the conversation towards universal metrics for evaluating the performance of deep neural networks for practical on-device edge usage. In particular, we propose a new balanced metric called NetScore, which is designed specifically to provide a quantitative assessment of the balance between accuracy, computational complexity, and network architecture complexity of a deep neural network, which is important for on-device edge operation. In what is one of the largest comparative analysis between deep neural networks in literature, the NetScore metric, the top-1 accuracy metric, and the popular information density metric were compared across a diverse set of 60 different deep convolutional neural networks for image classification on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC 2012) dataset. The evaluation results across these three metrics for this diverse set of networks are presented in this study to act as a reference guide for practitioners in the field. The proposed NetScore metric, along with the other tested metrics, are by no means perfect, but the hope is to push the conversation towards better universal metrics for evaluating deep neural networks for use in practical on-device edge scenarios to help guide practitioners in model design for such scenarios.Comment: 9 page

    Adversarial Generation of Training Examples: Applications to Moving Vehicle License Plate Recognition

    Full text link
    Generative Adversarial Networks (GAN) have attracted much research attention recently, leading to impressive results for natural image generation. However, to date little success was observed in using GAN generated images for improving classification tasks. Here we attempt to explore, in the context of car license plate recognition, whether it is possible to generate synthetic training data using GAN to improve recognition accuracy. With a carefully-designed pipeline, we show that the answer is affirmative. First, a large-scale image set is generated using the generator of GAN, without manual annotation. Then, these images are fed to a deep convolutional neural network (DCNN) followed by a bidirectional recurrent neural network (BRNN) with long short-term memory (LSTM), which performs the feature learning and sequence labelling. Finally, the pre-trained model is fine-tuned on real images. Our experimental results on a few data sets demonstrate the effectiveness of using GAN images: an improvement of 7.5% over a strong baseline with moderate-sized real data being available. We show that the proposed framework achieves competitive recognition accuracy on challenging test datasets. We also leverage the depthwise separate convolution to construct a lightweight convolutional RNN, which is about half size and 2x faster on CPU. Combining this framework and the proposed pipeline, we make progress in performing accurate recognition on mobile and embedded devices

    Digital Passport: A Novel Technological Strategy for Intellectual Property Protection of Convolutional Neural Networks

    Full text link
    In order to prevent deep neural networks from being infringed by unauthorized parties, we propose a generic solution which embeds a designated digital passport into a network, and subsequently, either paralyzes the network functionalities for unauthorized usages or maintain its functionalities in the presence of a verified passport. Such a desired network behavior is successfully demonstrated in a number of implementation schemes, which provide reliable, preventive and timely protections against tens of thousands of fake-passport deceptions. Extensive experiments also show that the deep neural network performance under unauthorized usages deteriorate significantly (e.g. with 33% to 82% reductions of CIFAR10 classification accuracies), while networks endorsed with valid passports remain intact.Comment: This paper proposes a new timely IPR solution that embed digital passports into CNN models to prevent the unauthorized network usage (i.e. infringement) by paralyzing the networks while maintaining its functionality for verified user

    Compact Generalized Non-local Network

    Full text link
    The non-local module is designed for capturing long-range spatio-temporal dependencies in images and videos. Although having shown excellent performance, it lacks the mechanism to model the interactions between positions across channels, which are of vital importance in recognizing fine-grained objects and actions. To address this limitation, we generalize the non-local module and take the correlations between the positions of any two channels into account. This extension utilizes the compact representation for multiple kernel functions with Taylor expansion that makes the generalized non-local module in a fast and low-complexity computation flow. Moreover, we implement our generalized non-local method within channel groups to ease the optimization. Experimental results illustrate the clear-cut improvements and practical applicability of the generalized non-local module on both fine-grained object recognition and video classification. Code is available at: https://github.com/KaiyuYue/cgnl-network.pytorch.Comment: Technical report; To appear at NIPS 2018; Code is available at https://github.com/KaiyuYue/cgnl-network.pytorc

    Learning a Repression Network for Precise Vehicle Search

    Full text link
    The growing explosion in the use of surveillance cameras in public security highlights the importance of vehicle search from large-scale image databases. Precise vehicle search, aiming at finding out all instances for a given query vehicle image, is a challenging task as different vehicles will look very similar to each other if they share same visual attributes. To address this problem, we propose the Repression Network (RepNet), a novel multi-task learning framework, to learn discriminative features for each vehicle image from both coarse-grained and detailed level simultaneously. Besides, benefited from the satisfactory accuracy of attribute classification, a bucket search method is proposed to reduce the retrieval time while still maintaining competitive performance. We conduct extensive experiments on the revised VehcileID dataset. Experimental results show that our RepNet achieves the state-of-the-art performance and the bucket search method can reduce the retrieval time by about 24 times

    Distributed Machine Learning in Materials that Couple Sensing, Actuation, Computation and Communication

    Full text link
    This paper reviews machine learning applications and approaches to detection, classification and control of intelligent materials and structures with embedded distributed computation elements. The purpose of this survey is to identify desired tasks to be performed in each type of material or structure (e.g., damage detection in composites), identify and compare common approaches to learning such tasks, and investigate models and training paradigms used. Machine learning approaches and common temporal features used in the domains of structural health monitoring, morphable aircraft, wearable computing and robotic skins are explored. As the ultimate goal of this research is to incorporate the approaches described in this survey into a robotic material paradigm, the potential for adapting the computational models used in these applications, and corresponding training algorithms, to an amorphous network of computing nodes is considered. Distributed versions of support vector machines, graphical models and mixture models developed in the field of wireless sensor networks are reviewed. Potential areas of investigation, including possible architectures for incorporating machine learning into robotic nodes, training approaches, and the possibility of using deep learning approaches for automatic feature extraction, are discussed
    • …
    corecore