287 research outputs found

    Distilling Word Embeddings: An Encoding Approach

    Full text link
    Distilling knowledge from a well-trained cumbersome network to a small one has recently become a new research topic, as lightweight neural networks with high performance are particularly in need in various resource-restricted systems. This paper addresses the problem of distilling word embeddings for NLP tasks. We propose an encoding approach to distill task-specific knowledge from a set of high-dimensional embeddings, which can reduce model complexity by a large margin as well as retain high accuracy, showing a good compromise between efficiency and performance. Experiments in two tasks reveal the phenomenon that distilling knowledge from cumbersome embeddings is better than directly training neural networks with small embeddings.Comment: Accepted by CIKM-16 as a short paper, and by the Representation Learning for Natural Language Processing (RL4NLP) Workshop @ACL-16 for presentatio

    Accelerating Training of Deep Neural Networks via Sparse Edge Processing

    Full text link
    We propose a reconfigurable hardware architecture for deep neural networks (DNNs) capable of online training and inference, which uses algorithmically pre-determined, structured sparsity to significantly lower memory and computational requirements. This novel architecture introduces the notion of edge-processing to provide flexibility and combines junction pipelining and operational parallelization to speed up training. The overall effect is to reduce network complexity by factors up to 30x and training time by up to 35x relative to GPUs, while maintaining high fidelity of inference results. This has the potential to enable extensive parameter searches and development of the largely unexplored theoretical foundation of DNNs. The architecture automatically adapts itself to different network sizes given available hardware resources. As proof of concept, we show results obtained for different bit widths.Comment: Presented at the 26th International Conference on Artificial Neural Networks (ICANN) 2017 in Alghero, Ital

    Compression of Deep Neural Networks on the Fly

    Full text link
    Thanks to their state-of-the-art performance, deep neural networks are increasingly used for object recognition. To achieve these results, they use millions of parameters to be trained. However, when targeting embedded applications the size of these models becomes problematic. As a consequence, their usage on smartphones or other resource limited devices is prohibited. In this paper we introduce a novel compression method for deep neural networks that is performed during the learning phase. It consists in adding an extra regularization term to the cost function of fully-connected layers. We combine this method with Product Quantization (PQ) of the trained weights for higher savings in storage consumption. We evaluate our method on two data sets (MNIST and CIFAR10), on which we achieve significantly larger compression rates than state-of-the-art methods

    Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video

    Get PDF
    Object detection is considered one of the most challenging problems in this field of computer vision, as it involves the combination of object classification and object localization within a scene. Recently, deep neural networks (DNNs) have been demonstrated to achieve superior object detection performance compared to other approaches, with YOLOv2 (an improved You Only Look Once model) being one of the state-of-the-art in DNN-based object detection methods in terms of both speed and accuracy. Although YOLOv2 can achieve real-time performance on a powerful GPU, it still remains very challenging for leveraging this approach for real-time object detection in video on embedded computing devices with limited computational power and limited memory. In this paper, we propose a new framework called Fast YOLO, a fast You Only Look Once framework which accelerates YOLOv2 to be able to perform object detection in video on embedded devices in a real-time manner. First, we leverage the evolutionary deep intelligence framework to evolve the YOLOv2 network architecture and produce an optimized architecture (referred to as O-YOLOv2 here) that has 2.8X fewer parameters with just a ~2% IOU drop. To further reduce power consumption on embedded devices while maintaining performance, a motion-adaptive inference method is introduced into the proposed Fast YOLO framework to reduce the frequency of deep inference with O-YOLOv2 based on temporal motion characteristics. Experimental results show that the proposed Fast YOLO framework can reduce the number of deep inferences by an average of 38.13%, and an average speedup of ~3.3X for objection detection in video compared to the original YOLOv2, leading Fast YOLO to run an average of ~18FPS on a Nvidia Jetson TX1 embedded system

    Scalable Compression of Deep Neural Networks

    Full text link
    Deep neural networks generally involve some layers with mil- lions of parameters, making them difficult to be deployed and updated on devices with limited resources such as mobile phones and other smart embedded systems. In this paper, we propose a scalable representation of the network parameters, so that different applications can select the most suitable bit rate of the network based on their own storage constraints. Moreover, when a device needs to upgrade to a high-rate network, the existing low-rate network can be reused, and only some incremental data are needed to be downloaded. We first hierarchically quantize the weights of a pre-trained deep neural network to enforce weight sharing. Next, we adaptively select the bits assigned to each layer given the total bit budget. After that, we retrain the network to fine-tune the quantized centroids. Experimental results show that our method can achieve scalable compression with graceful degradation in the performance.Comment: 5 pages, 4 figures, ACM Multimedia 201
    • …
    corecore