946 research outputs found

    JALAD: Joint Accuracy- and Latency-Aware Deep Structure Decoupling for Edge-Cloud Execution

    Full text link
    Recent years have witnessed a rapid growth of deep-network based services and applications. A practical and critical problem thus has emerged: how to effectively deploy the deep neural network models such that they can be executed efficiently. Conventional cloud-based approaches usually run the deep models in data center servers, causing large latency because a significant amount of data has to be transferred from the edge of network to the data center. In this paper, we propose JALAD, a joint accuracy- and latency-aware execution framework, which decouples a deep neural network so that a part of it will run at edge devices and the other part inside the conventional cloud, while only a minimum amount of data has to be transferred between them. Though the idea seems straightforward, we are facing challenges including i) how to find the best partition of a deep structure; ii) how to deploy the component at an edge device that only has limited computation power; and iii) how to minimize the overall execution latency. Our answers to these questions are a set of strategies in JALAD, including 1) A normalization based in-layer data compression strategy by jointly considering compression rate and model accuracy; 2) A latency-aware deep decoupling strategy to minimize the overall execution latency; and 3) An edge-cloud structure adaptation strategy that dynamically changes the decoupling for different network conditions. Experiments demonstrate that our solution can significantly reduce the execution latency: it speeds up the overall inference execution with a guaranteed model accuracy loss.Comment: conference, copyright transfered to IEE

    CLEVER: a cooperative and cross-layer approach to video streaming in HetNets

    Get PDF
    We investigate the problem of providing a video streaming service to mobile users in an heterogeneous cellular network composed of micro e-NodeBs (eNBs) and macro e-NodeBs (MeNBs). More in detail, we target a cross-layer dynamic allocation of the bandwidth resources available over a set of eNBs and one MeNB, with the goal of reducing the delay per chunk experienced by users. After optimally formulating the problem of minimizing the chunk delay, we detail the Cross LayEr Video stReaming (CLEVER) algorithm, to practically tackle it. CLEVER makes allocation decisions on the basis of information retrieved from the application layer aswell as from lower layers. Results, obtained over two representative case studies, show that CLEVER is able to limit the chunk delay, while also reducing the amount of bandwidth reserved for offloaded users on the MeNB, as well as the number of offloaded users. In addition, we show that CLEVER performs clearly better than two selected reference algorithms, while being very close to a best bound. Finally, we show that our solution is able to achieve high fairness indexes and good levels of Quality of Experience (QoE)

    Enhancing Mobile Capacity through Generic and Efficient Resource Sharing

    Get PDF
    Mobile computing devices are becoming indispensable in every aspect of human life, but diverse hardware limits make current mobile devices far from ideal for satisfying the performance requirements of modern mobile applications and being used anytime, anywhere. Mobile Cloud Computing (MCC) could be a viable solution to bypass these limits which enhances the mobile capacity through cooperative resource sharing, but is challenging due to the heterogeneity of mobile devices in both hardware and software aspects. Traditional schemes either restrict to share a specific type of hardware resource within individual applications, which requires tremendous reprogramming efforts; or disregard the runtime execution pattern and transmit too much unnecessary data, resulting in bandwidth and energy waste.To address the aforementioned challenges, we present three novel designs of resource sharing frameworks which utilize the various system resources from a remote or personal cloud to enhance the mobile capacity in a generic and efficient manner. First, we propose a novel method-level offloading methodology to run the mobile computational workload on the remote cloud CPU. Minimized data transmission is achieved during such offloading by identifying and selectively migrating the memory contexts which are necessary to the method execution. Second, we present a systematic framework to maximize the mobile performance of graphics rendering with the remote cloud GPU, during which the redundant pixels across consecutive frames are reused to reduce the transmitted frame data. Last, we propose to exploit the unified mobile OS services and generically interconnect heterogeneous mobile devices towards a personal mobile cloud, which complement and flexibly share mobile peripherals (e.g., sensors, camera) with each other

    A Study on Resource Allocation Schemes in Edge Cloud Computing

    Get PDF
    北九州市立大学博士(工学)Edge and cloud computing is expected as a novel network architecture for supporting people’s daily lives and work. In edge and cloud computing, users expect to receive a wide variety of services with different requirements in terms of transmission rates on various locations, so there are some important issues regarding efficient service provision to meet user requirements independently of their locations: (1) fair service provision in cloud computing, (2) service provision according to user requirements in edge computing, and (3) efficient transmission in edge and cloud computing. In this study, I propose efficient schemes to allocate and use network resources among flows to solve these issues in edge and cloud computing and show the effectiveness of the proposed schemes through simulation evaluations.doctoral thesi

    Distributed deep learning inference in fog networks

    Get PDF
    Today's smart devices are equipped with powerful integrated chips and built-in heterogeneous sensors that can leverage their potential to execute heavy computation and produce a large amount of sensor data. For instance, modern smart cameras integrate artificial intelligence to capture images that detect any objects in the scene and change parameters, such as contrast and color based on environmental conditions. The accuracy of the object recognition and classification achieved by intelligent applications has improved due to recent advancements in artificial intelligence (AI) and machine learning (ML), particularly, deep neural networks (DNNs). Despite the capability to carry out some AI/ML computation, smart devices have limited battery power and computing resources. Therefore, DNN computation is generally offloaded to powerful computing nodes such as cloud servers. However, it is challenging to satisfy latency, reliability, and bandwidth constraints in cloud-based AI. Thus, in recent years, AI services and tasks have been pushed closer to the end-users by taking advantage of the fog computing paradigm to meet these requirements. Generally, the trained DNN models are offloaded to the fog devices for DNN inference. This is accomplished by partitioning the DNN and distributing the computation in fog networks. This thesis addresses offloading DNN inference by dividing and distributing a pre-trained network onto heterogeneous embedded devices. Specifically, it implements the adaptive partitioning and offloading algorithm based on matching theory proposed in an article, titled "Distributed inference acceleration with adaptive dnn partitioning and offloading". The implementation was evaluated in a fog testbed, including Nvidia Jetson nano devices. The obtained results show that the adaptive solution outperforms other schemes (Random and Greedy) with respect to computation time and communication latency

    Combining Cloud and Mobile Computing for Machine Learning

    Full text link
    Although the computing power of mobile devices is increasing, machine learning models are also growing in size. This trend creates problems for mobile devices due to limitations like their memory capacity and battery life. While many services, like ChatGPT and Midjourney, run all the inferences in the cloud, we believe a flexible and fine-grained task distribution is more desirable. In this work, we consider model segmentation as a solution to improving the user experience, dividing the computation between mobile devices and the cloud in a way that offloads the compute-heavy portion of the model while minimizing the data transfer required. We show that the division not only reduces the wait time for users but can also be fine-tuned to optimize the workloads of the cloud. To achieve that, we design a scheduler that collects information about network quality, client device capability, and job requirements, making decisions to achieve consistent performance across a range of devices while reducing the work the cloud needs to perform.Comment: Ruiqi Xu and Tianchi Zhang contributed equally to this wor
    corecore