2,934 research outputs found

    JALAD: Joint Accuracy- and Latency-Aware Deep Structure Decoupling for Edge-Cloud Execution

    Full text link
    Recent years have witnessed a rapid growth of deep-network based services and applications. A practical and critical problem thus has emerged: how to effectively deploy the deep neural network models such that they can be executed efficiently. Conventional cloud-based approaches usually run the deep models in data center servers, causing large latency because a significant amount of data has to be transferred from the edge of network to the data center. In this paper, we propose JALAD, a joint accuracy- and latency-aware execution framework, which decouples a deep neural network so that a part of it will run at edge devices and the other part inside the conventional cloud, while only a minimum amount of data has to be transferred between them. Though the idea seems straightforward, we are facing challenges including i) how to find the best partition of a deep structure; ii) how to deploy the component at an edge device that only has limited computation power; and iii) how to minimize the overall execution latency. Our answers to these questions are a set of strategies in JALAD, including 1) A normalization based in-layer data compression strategy by jointly considering compression rate and model accuracy; 2) A latency-aware deep decoupling strategy to minimize the overall execution latency; and 3) An edge-cloud structure adaptation strategy that dynamically changes the decoupling for different network conditions. Experiments demonstrate that our solution can significantly reduce the execution latency: it speeds up the overall inference execution with a guaranteed model accuracy loss.Comment: conference, copyright transfered to IEE

    Towards Accurate and High-Speed Spiking Neuromorphic Systems with Data Quantization-Aware Deep Networks

    Full text link
    Deep Neural Networks (DNNs) have gained immense success in cognitive applications and greatly pushed today's artificial intelligence forward. The biggest challenge in executing DNNs is their extremely data-extensive computations. The computing efficiency in speed and energy is constrained when traditional computing platforms are employed in such computational hungry executions. Spiking neuromorphic computing (SNC) has been widely investigated in deep networks implementation own to their high efficiency in computation and communication. However, weights and signals of DNNs are required to be quantized when deploying the DNNs on the SNC, which results in unacceptable accuracy loss. %However, the system accuracy is limited by quantizing data directly in deep networks deployment. Previous works mainly focus on weights discretize while inter-layer signals are mainly neglected. In this work, we propose to represent DNNs with fixed integer inter-layer signals and fixed-point weights while holding good accuracy. We implement the proposed DNNs on the memristor-based SNC system as a deployment example. With 4-bit data representation, our results show that the accuracy loss can be controlled within 0.02% (2.3%) on MNIST (CIFAR-10). Compared with the 8-bit dynamic fixed-point DNNs, our system can achieve more than 9.8x speedup, 89.1% energy saving, and 30% area saving.Comment: 6 pages, 4 figure
    • …
    corecore