1,223 research outputs found

    Learning to infer: RL-based search for DNN primitive selection on Heterogeneous Embedded Systems

    Full text link
    Deep Learning is increasingly being adopted by industry for computer vision applications running on embedded devices. While Convolutional Neural Networks' accuracy has achieved a mature and remarkable state, inference latency and throughput are a major concern especially when targeting low-cost and low-power embedded platforms. CNNs' inference latency may become a bottleneck for Deep Learning adoption by industry, as it is a crucial specification for many real-time processes. Furthermore, deployment of CNNs across heterogeneous platforms presents major compatibility issues due to vendor-specific technology and acceleration libraries. In this work, we present QS-DNN, a fully automatic search based on Reinforcement Learning which, combined with an inference engine optimizer, efficiently explores through the design space and empirically finds the optimal combinations of libraries and primitives to speed up the inference of CNNs on heterogeneous embedded devices. We show that, an optimized combination can achieve 45x speedup in inference latency on CPU compared to a dependency-free baseline and 2x on average on GPGPU compared to the best vendor library. Further, we demonstrate that, the quality of results and time "to-solution" is much better than with Random Search and achieves up to 15x better results for a short-time search

    The Evolution of Neural Network-Based Chart Patterns: A Preliminary Study

    Full text link
    A neural network-based chart pattern represents adaptive parametric features, including non-linear transformations, and a template that can be applied in the feature space. The search of neural network-based chart patterns has been unexplored despite its potential expressiveness. In this paper, we formulate a general chart pattern search problem to enable cross-representational quantitative comparison of various search schemes. We suggest a HyperNEAT framework applying state-of-the-art deep neural network techniques to find attractive neural network-based chart patterns; These techniques enable a fast evaluation and search of robust patterns, as well as bringing a performance gain. The proposed framework successfully found attractive patterns on the Korean stock market. We compared newly found patterns with those found by different search schemes, showing the proposed approach has potential.Comment: 8 pages, In proceedings of Genetic and Evolutionary Computation Conference (GECCO 2017), Berlin, German

    Visualizing classification of natural video sequences using sparse, hierarchical models of cortex.

    Get PDF
    Recent work on hierarchical models of visual cortex has reported state-of-the-art accuracy on whole-scene labeling using natural still imagery. This raises the question of whether the reported accuracy may be due to the sophisticated, non-biological back-end supervised classifiers typically used (support vector machines) and/or the limited number of images used in these experiments. In particular, is the model classifying features from the object or the background? Previous work (Landecker, Brumby, et al., COSYNE 2010) proposed tracing the spatial support of a classifier’s decision back through a hierarchical cortical model to determine which parts of the image contributed to the classification, compared to the positions of objects in the scene. In this way, we can go beyond standard measures of accuracy to provide tools for visualizing and analyzing high-level object classification. We now describe new work exploring the extension of these ideas to detection of objects in video sequences of natural scenes

    General Purpose Computing on Graphics Processing Units for Accelerated Deep Learning in Neural Networks

    Get PDF
    Graphics processing units (GPUs) contain a significant number of cores relative to central processing units (CPUs), allowing them to handle high levels of parallelization in multithreading. A general-purpose GPU (GPGPU) is a GPU that has its threads and memory repurposed on a software level to leverage the multithreading made possible by the GPUā€™s hardware, and thus is an extremely strong platform for intense computing ā€“ there is no hardware difference between GPUs and GPGPUs. Deep learning is one such example of intense computing that is best implemented on a GPGPU, as its hardware structure of a grid of blocks, each containing processing threads, can handle the immense number of necessary calculations in parallel. A convolutional neural network (CNN) created for financial data analysis shows this advantage in the runtime of the training and testing of a neural network

    End-to-End Learning of Speech 2D Feature-Trajectory for Prosthetic Hands

    Full text link
    Speech is one of the most common forms of communication in humans. Speech commands are essential parts of multimodal controlling of prosthetic hands. In the past decades, researchers used automatic speech recognition systems for controlling prosthetic hands by using speech commands. Automatic speech recognition systems learn how to map human speech to text. Then, they used natural language processing or a look-up table to map the estimated text to a trajectory. However, the performance of conventional speech-controlled prosthetic hands is still unsatisfactory. Recent advancements in general-purpose graphics processing units (GPGPUs) enable intelligent devices to run deep neural networks in real-time. Thus, architectures of intelligent systems have rapidly transformed from the paradigm of composite subsystems optimization to the paradigm of end-to-end optimization. In this paper, we propose an end-to-end convolutional neural network (CNN) that maps speech 2D features directly to trajectories for prosthetic hands. The proposed convolutional neural network is lightweight, and thus it runs in real-time in an embedded GPGPU. The proposed method can use any type of speech 2D feature that has local correlations in each dimension such as spectrogram, MFCC, or PNCC. We omit the speech to text step in controlling the prosthetic hand in this paper. The network is written in Python with Keras library that has a TensorFlow backend. We optimized the CNN for NVIDIA Jetson TX2 developer kit. Our experiment on this CNN demonstrates a root-mean-square error of 0.119 and 20ms running time to produce trajectory outputs corresponding to the voice input data. To achieve a lower error in real-time, we can optimize a similar CNN for a more powerful embedded GPGPU such as NVIDIA AGX Xavier

    Convolutional Neural Networks for Speech Controlled Prosthetic Hands

    Full text link
    Speech recognition is one of the key topics in artificial intelligence, as it is one of the most common forms of communication in humans. Researchers have developed many speech-controlled prosthetic hands in the past decades, utilizing conventional speech recognition systems that use a combination of neural network and hidden Markov model. Recent advancements in general-purpose graphics processing units (GPGPUs) enable intelligent devices to run deep neural networks in real-time. Thus, state-of-the-art speech recognition systems have rapidly shifted from the paradigm of composite subsystems optimization to the paradigm of end-to-end optimization. However, a low-power embedded GPGPU cannot run these speech recognition systems in real-time. In this paper, we show the development of deep convolutional neural networks (CNN) for speech control of prosthetic hands that run in real-time on a NVIDIA Jetson TX2 developer kit. First, the device captures and converts speech into 2D features (like spectrogram). The CNN receives the 2D features and classifies the hand gestures. Finally, the hand gesture classes are sent to the prosthetic hand motion control system. The whole system is written in Python with Keras, a deep learning library that has a TensorFlow backend. Our experiments on the CNN demonstrate the 91% accuracy and 2ms running time of hand gestures (text output) from speech commands, which can be used to control the prosthetic hands in real-time.Comment: 2019 First International Conference on Transdisciplinary AI (TransAI), Laguna Hills, California, USA, 2019, pp. 35-4

    Learning to infer: RL-based search for DNN primitive selection on Heterogeneous Embedded Systems

    Get PDF
    Deep Learning is increasingly being adopted by industry for computer vision applications running on embedded devices. While Convolutional Neural Networks' accuracy has achieved a mature and remarkable state, inference latency and throughput are a major concern especially when targeting low-cost and low-power embedded platforms. CNNs' inference latency may become a bottleneck for Deep Learning adoption by industry, as it is a crucial specification for many real-time processes. Furthermore, deployment of CNNs across heterogeneous platforms presents major compatibility issues due to vendor-specific technology and acceleration libraries.In this work, we present QS-DNN, a fully automatic search based on Reinforcement Learning which, combined with an inference engine optimizer, efficiently explores through the design space and empirically finds the optimal combinations of libraries and primitives to speed up the inference of CNNs on heterogeneous embedded devices. We show that, an optimized combination can achieve 45x speedup in inference latency on CPU compared to a dependency-free baseline and 2x on average on GPGPU compared to the best vendor library. Further, we demonstrate that, the quality of results and time "to-solution" is much better than with Random Search and achieves up to 15x better results for a short-time search

    ???????????? ??????????????? ????????? ???????????? ?????? ??????

    Get PDF
    Department of Mehcanical EngineeringUnmanned aerial vehicles (UAVs) are widely used in various areas such as exploration, transportation and rescue activity due to light weight, low cost, high mobility and intelligence. This intelligent system consists of highly integrated and embedded systems along with a microprocessor to perform specific task by computing algorithm or processing data. In particular, image processing is one of main core technologies to handle important tasks such as target tracking, positioning, visual servoing using visual system. However, it often requires heavy amount of computation burden and an additional micro PC controller with a flight computer should be additionally used to process image data. However, performance of the controller is not so good enough due to limited power, size, and weight. Therefore, efficient image processing techniques are needed considering computing load and hardware resources for real time operation on embedded systems. The objective of the thesis research is to develop an efficient image processing framework on embedded systems utilizing neural network and various optimized computation techniques to satisfy both efficient computing speed versus resource usage and accuracy. Image processing techniques has been proposed and tested for management computing resources and operating high performance missions in embedded systems. Graphic processing units (GPUs) available in the market can be used for parallel computing to accelerate computing speed. Multiple cores within central processing units (CPUs) are used like multi-threading during data uploading and downloading between the CPU and the GPU. In order to minimize computing load, several methods have been proposed. The first method is visualization of convolutional neural network (CNN) that can perform both localization and detection simultaneously. The second is region proposal for input area of CNN through simple image processing, which helps algorithm to avoid full frame processing. Finally, surplus computing resources can be saved by control the transient performance such as the FPS limitation. These optimization methods have been experimentally applied to a ground vehicle and quadrotor UAVs and verified that the developed methods offer an optimization to process in embedded environment by saving CPU and memory resources. In addition, they can support to perform various tasks such as object detection and path planning, obstacle avoidance. Through optimization and algorithms, they reveal a number of improvements for the embedded system compared to the existing. Considering the characteristics of the system to transplant the various useful algorithms to the embedded system, the method developed in the research can be further applied to various practical applications.ope
    • ā€¦
    corecore