17,361 research outputs found

    Neuro-memristive Circuits for Edge Computing: A review

    Full text link
    The volume, veracity, variability, and velocity of data produced from the ever-increasing network of sensors connected to Internet pose challenges for power management, scalability, and sustainability of cloud computing infrastructure. Increasing the data processing capability of edge computing devices at lower power requirements can reduce several overheads for cloud computing solutions. This paper provides the review of neuromorphic CMOS-memristive architectures that can be integrated into edge computing devices. We discuss why the neuromorphic architectures are useful for edge devices and show the advantages, drawbacks and open problems in the field of neuro-memristive circuits for edge computing

    Efficient Object Detection in Mobile and Embedded Devices with Deep Neural Networks

    Get PDF
    [EN] Neural networks have become the standard for high accuracy computer vision. These algorithms can be built with arbitrarily large architectures to handle an ever growing complexity in the data they process. State of the art neural network architectures are primarily concerned with increasing the recognition accuracy when performing inference on an image, which creates an insatiable demand for energy and compute power. These models are primarily targeted to run on dense compute units such as GPUs. In recent years, demand to allow these models to execute in limited capacity environments such as smartphones, however even the most compact variations of these state of the art networks constantly push the boundaries of the power envelop under which they run. With the emergence of the Internet of Things, it is becoming a priority to enable mobile systems to perform image recognition at the edge, but with small energy requirements. This thesis focuses on the design and implementation of an object detection neural network that attempts to solve this problem, providing reasonable accuracy rates with extremely low compute power requirements. This is achieved by re-imagining the meta architecture of traditional object detection models and discovering a mechanism to classify and localize objects through a set of neural network based algorithms that are better aimed to mobile and embedded devices. The main contributions of this thesis are: (i) provide a better image processing algorithm that is more suitable at preparing data for consumption by taking advantage of the characteristics of the ISP available in these devices; (ii) provide a neural network architecture that maintains acceptable accuracy targets with minimal computational requirements by making efficient use of basic neural algorithms; and (iii) provide a programming framework for how these systems can be most efficiently implemented in a manner that is optimized for the underlying hardware units available in these devices by taking into account memory and computation restrictions

    JALAD: Joint Accuracy- and Latency-Aware Deep Structure Decoupling for Edge-Cloud Execution

    Full text link
    Recent years have witnessed a rapid growth of deep-network based services and applications. A practical and critical problem thus has emerged: how to effectively deploy the deep neural network models such that they can be executed efficiently. Conventional cloud-based approaches usually run the deep models in data center servers, causing large latency because a significant amount of data has to be transferred from the edge of network to the data center. In this paper, we propose JALAD, a joint accuracy- and latency-aware execution framework, which decouples a deep neural network so that a part of it will run at edge devices and the other part inside the conventional cloud, while only a minimum amount of data has to be transferred between them. Though the idea seems straightforward, we are facing challenges including i) how to find the best partition of a deep structure; ii) how to deploy the component at an edge device that only has limited computation power; and iii) how to minimize the overall execution latency. Our answers to these questions are a set of strategies in JALAD, including 1) A normalization based in-layer data compression strategy by jointly considering compression rate and model accuracy; 2) A latency-aware deep decoupling strategy to minimize the overall execution latency; and 3) An edge-cloud structure adaptation strategy that dynamically changes the decoupling for different network conditions. Experiments demonstrate that our solution can significantly reduce the execution latency: it speeds up the overall inference execution with a guaranteed model accuracy loss.Comment: conference, copyright transfered to IEE
    corecore