443 research outputs found

    COMPUTATION OFFLOADING DESIGN FOR DEEP NEURAL NETWORK INFERENCE ON IoT DEVICES

    Get PDF
    In recent times, advances in the technologies of Internet-of-Things (IoT) and Deep Neural Networks (DNN) have significantly increased the accuracy and speed of a variety of smart applications. However, one of the barriers to deploying DNN to IoT is the computational limitations of IoT devices as compared with the computationally expensive task of DNN inference. Computation offloading is an approach that addresses this problem by offloading DNN computation tasks to cloud servers. In this thesis we propose a collaborative computation offloading solution, in which some of the work is done on the IoT device, and the remainder of the work is done by the cloud server. There are two components to this collaborative approach. First, the input image to the DNN is partitioned into multiple pieces, allowing the pieces of the image to be processed in parallel, speeding up the inference time. Second, the DNN is split between two of its layers, so that layers before the split point are processed on the IoT device, and layers after the split point are processed by the cloud server. We investigated several strategies for partitioning the image and splitting the DNN, and we evaluated the results using several commonly-used DNNs: Lenet-5, AlexNet, and VGG-16. The results show that collaborative computation offloading sped up the inference time of IoT devices by 35-40% as compared with non-collaborative methods

    Enabling Deep Learning on Edge Devices

    Full text link
    Deep neural networks (DNNs) have succeeded in many different perception tasks, e.g., computer vision, natural language processing, reinforcement learning, etc. The high-performed DNNs heavily rely on intensive resource consumption. For example, training a DNN requires high dynamic memory, a large-scale dataset, and a large number of computations (a long training time); even inference with a DNN also demands a large amount of static storage, computations (a long inference time), and energy. Therefore, state-of-the-art DNNs are often deployed on a cloud server with a large number of super-computers, a high-bandwidth communication bus, a shared storage infrastructure, and a high power supplement. Recently, some new emerging intelligent applications, e.g., AR/VR, mobile assistants, Internet of Things, require us to deploy DNNs on resource-constrained edge devices. Compare to a cloud server, edge devices often have a rather small amount of resources. To deploy DNNs on edge devices, we need to reduce the size of DNNs, i.e., we target a better trade-off between resource consumption and model accuracy. In this dissertation, we studied four edge intelligence scenarios, i.e., Inference on Edge Devices, Adaptation on Edge Devices, Learning on Edge Devices, and Edge-Server Systems, and developed different methodologies to enable deep learning in each scenario. Since current DNNs are often over-parameterized, our goal is to find and reduce the redundancy of the DNNs in each scenario.Comment: PhD thesis at ETH Zuric
    • …
    corecore