835 research outputs found

    BCEdge: SLO-Aware DNN Inference Services with Adaptive Batching on Edge Platforms

    Full text link
    As deep neural networks (DNNs) are being applied to a wide range of edge intelligent applications, it is critical for edge inference platforms to have both high-throughput and low-latency at the same time. Such edge platforms with multiple DNN models pose new challenges for scheduler designs. First, each request may have different service level objectives (SLOs) to improve quality of service (QoS). Second, the edge platforms should be able to efficiently schedule multiple heterogeneous DNN models so that system utilization can be improved. To meet these two goals, this paper proposes BCEdge, a novel learning-based scheduling framework that takes adaptive batching and concurrent execution of DNN inference services on edge platforms. We define a utility function to evaluate the trade-off between throughput and latency. The scheduler in BCEdge leverages maximum entropy-based deep reinforcement learning (DRL) to maximize utility by 1) co-optimizing batch size and 2) the number of concurrent models automatically. Our prototype implemented on different edge platforms shows that the proposed BCEdge enhances utility by up to 37.6% on average, compared to state-of-the-art solutions, while satisfying SLOs

    System Optimisation for Multi-access Edge Computing Based on Deep Reinforcement Learning

    Get PDF
    Multi-access edge computing (MEC) is an emerging and important distributed computing paradigm that aims to extend cloud service to the network edge to reduce network traffic and service latency. Proper system optimisation and maintenance are crucial to maintaining high Quality-of-service (QoS) for end-users. However, with the increasing complexity of the architecture of MEC and mobile applications, effectively optimising MEC systems is non-trivial. Traditional optimisation methods are generally based on simplified mathematical models and fixed heuristics, which rely heavily on expert knowledge. As a consequence, when facing dynamic MEC scenarios, considerable human efforts and expertise are required to redesign the model and tune the heuristics, which is time-consuming. This thesis aims to develop deep reinforcement learning (DRL) methods to handle system optimisation problems in MEC. Instead of developing fixed heuristic algorithms for the problems, this thesis aims to design DRL-based methods that enable systems to learn optimal solutions on their own. This research demonstrates the effectiveness of DRL-based methods on two crucial system optimisation problems: task offloading and service migration. Specifically, this thesis first investigate the dependent task offloading problem that considers the inner dependencies of tasks. This research builds a DRL-based method combining sequence-to-sequence (seq2seq) neural network to address the problem. Experiment results demonstrate that our method outperforms the existing heuristic algorithms and achieves near-optimal performance. To further enhance the learning efficiency of the DRL-based task offloading method for unseen learning tasks, this thesis then integrates meta reinforcement learning to handle the task offloading problem. Our method can adapt fast to new environments with a small number of gradient updates and samples. Finally, this thesis exploits the DRL-based solution for the service migration problem in MEC considering user mobility. This research models the service migration problem as a Partially Observable Markov Decision Process (POMDP) and propose a tailored actor-critic algorithm combining Long-short Term Memory (LSTM) to solve the POMDP. Results from extensive experiments based on real-world mobility traces demonstrate that our method consistently outperforms both the heuristic and state-of-the-art learning-driven algorithms on various MEC scenarios

    Smart Decision-Making via Edge Intelligence for Smart Cities

    Get PDF
    Smart cities are an ambitious vision for future urban environments. The ultimate aim of smart cities is to use modern technology to optimize city resources and operations while improving overall quality-of-life of its citizens. Realizing this ambitious vision will require embracing advancements in information communication technology, data analysis, and other technologies. Because smart cities naturally produce vast amounts of data, recent artificial intelligence (AI) techniques are of interest due to their ability to transform raw data into insightful knowledge to inform decisions (e.g., using live road traffic data to control traffic lights based on current traffic conditions). However, training and providing these AI applications is non-trivial and will require sufficient computing resources. Traditionally, cloud computing infrastructure have been used to process computationally intensive tasks; however, due to the time-sensitivity of many of these smart city applications, novel computing hardware/technologies are required. The recent advent of edge computing provides a promising computing infrastructure to support the needs of the smart cities of tomorrow. Edge computing pushes compute resources close to end users to provide reduced latency and improved scalability — making it a viable candidate to support smart cities. However, it comes with hardware limitations that are necessary to consider. This thesis explores the use of the edge computing paradigm for smart city applications and how to make efficient, smart decisions related to their available resources. This is done while considering the quality-of-service provided to end users. This work can be seen as four parts. First, this work touches on how to optimally place and serve AI-based applications on edge computing infrastructure to maximize quality-of-service to end users. This is cast as an optimization problem and solved with efficient algorithms that approximate the optimal solution. Second, this work investigates the applicability of compression techniques to reduce offloading costs for AI-based applications in edge computing systems. Finally, this thesis then demonstrate how edge computing can support AI-based solutions for smart city applications, namely, smart energy and smart traffic. These applications are approached using the recent paradigm of federated learning. The contributions of this thesis include the design of novel algorithms and system design strategies for placement and scheduling of AI-based services on edge computing systems, formal formulation for trade-offs between delivered AI model performance and latency, compression for offloading decisions for communication reductions, and evaluation of federated learning-based approaches for smart city applications

    Trustworthy Edge Machine Learning: A Survey

    Full text link
    The convergence of Edge Computing (EC) and Machine Learning (ML), known as Edge Machine Learning (EML), has become a highly regarded research area by utilizing distributed network resources to perform joint training and inference in a cooperative manner. However, EML faces various challenges due to resource constraints, heterogeneous network environments, and diverse service requirements of different applications, which together affect the trustworthiness of EML in the eyes of its stakeholders. This survey provides a comprehensive summary of definitions, attributes, frameworks, techniques, and solutions for trustworthy EML. Specifically, we first emphasize the importance of trustworthy EML within the context of Sixth-Generation (6G) networks. We then discuss the necessity of trustworthiness from the perspective of challenges encountered during deployment and real-world application scenarios. Subsequently, we provide a preliminary definition of trustworthy EML and explore its key attributes. Following this, we introduce fundamental frameworks and enabling technologies for trustworthy EML systems, and provide an in-depth literature review of the latest solutions to enhance trustworthiness of EML. Finally, we discuss corresponding research challenges and open issues.Comment: 27 pages, 7 figures, 10 table

    Towards GPU Utilization Prediction for Cloud Deep Learning

    Get PDF
    Understanding the GPU utilization of Deep Learning (DL) workloads is important for enhancing resource-efficiency and cost-benefit decision making for DL frameworks in the cloud. Current approaches to determine DL workload GPU utilization rely on online profiling within isolated GPU devices, and must be performed for every unique DL workload submission resulting in resource under-utilization and reduced service availability. In this paper, we propose a prediction engine to proactively determine the GPU utilization of heterogeneous DL workloads without the need for in-depth or isolated online profiling. We demonstrate that it is possible to predict DL workload GPU utilization via extracting information from its model computation graph. Our experiments show that the prediction engine achieves an RMSLE of 0.154, and can be exploited by DL schedulers to achieve up to 61.5% improvement to GPU cluster utilization
    • …
    corecore