Search CORE

1,531 research outputs found

Architectural Vision for Quantum Computing in the Edge-Cloud Continuum

Author: Barzen Johanna
Bechtold Marvin
Dustdar Schahram
Furutanpey Alireza
Leymann Frank
Raith Philipp
Truger Felix
Publication venue
Publication date: 09/05/2023
Field of study

Quantum processing units (QPUs) are currently exclusively available from cloud vendors. However, with recent advancements, hosting QPUs is soon possible everywhere. Existing work has yet to draw from research in edge computing to explore systems exploiting mobile QPUs, or how hybrid applications can benefit from distributed heterogeneous resources. Hence, this work presents an architecture for Quantum Computing in the edge-cloud continuum. We discuss the necessity, challenges, and solution approaches for extending existing work on classical edge computing to integrate QPUs. We describe how warm-starting allows defining workflows that exploit the hierarchical resources spread across the continuum. Then, we introduce a distributed inference engine with hybrid classical-quantum neural networks (QNNs) to aid system designers in accommodating applications with complex requirements that incur the highest degree of heterogeneity. We propose solutions focusing on classical layer partitioning and quantum circuit cutting to demonstrate the potential of utilizing classical and quantum computation across the continuum. To evaluate the importance and feasibility of our vision, we provide a proof of concept that exemplifies how extending a classical partition method to integrate quantum circuits can improve the solution quality. Specifically, we implement a split neural network with optional hybrid QNN predictors. Our results show that extending classical methods with QNNs is viable and promising for future work.Comment: 16 pages, 5 figures, Vision Pape

arXiv.org e-Print Archive

ChainsFormer: A Chain Latency-aware Resource Provisioning Approach for Microservices Cluster

Author: Buyya Rajkumar
Gill Sukhpal Singh
Song Chenghao
Wu Huaming
Xu Chengzhong
Xu Minxian
Ye Kejiang
Publication venue
Publication date: 07/10/2023
Field of study

The trend towards transitioning from monolithic applications to microservices has been widely embraced in modern distributed systems and applications. This shift has resulted in the creation of lightweight, fine-grained, and self-contained microservices. Multiple microservices can be linked together via calls and inter-dependencies to form complex functions. One of the challenges in managing microservices is provisioning the optimal amount of resources for microservices in the chain to ensure application performance while improving resource usage efficiency. This paper presents ChainsFormer, a framework that analyzes microservice inter-dependencies to identify critical chains and nodes, and provision resources based on reinforcement learning. To analyze chains, ChainsFormer utilizes light-weight machine learning techniques to address the dynamic nature of microservice chains and workloads. For resource provisioning, a reinforcement learning approach is used that combines vertical and horizontal scaling to determine the amount of allocated resources and the number of replicates. We evaluate the effectiveness of ChainsFormer using realistic applications and traces on a real testbed based on Kubernetes. Our experimental results demonstrate that ChainsFormer can reduce response time by up to 26% and improve processed requests per second by 8% compared with state-of-the-art techniques.Comment: 15 page

arXiv.org e-Print Archive

Providing Transaction Class-Based QoS in In-Memory Data Grids via Machine Learning

Author: Ciciani Bruno
DI SANZO Pierangelo
Francesco Maria Molfese
Rughetti Diego
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Elastic architectures and the ”pay-as-you-go” resource pricing model offered by many cloud infrastructure providers may seem the right choice for companies dealing with data centric applications characterized by high variable workload. In such a context, in-memory transactional data grids have demonstrated to be particularly suited for exploiting advantages provided by elastic computing platforms, mainly thanks to their ability to be dynamically (re-)sized and tuned. Anyway, when specific QoS requirements have to be met, this kind of architectures have revealed to be complex to be managed by humans. Particularly, their management is a very complex task without the stand of mechanisms supporting run-time automatic sizing/tuning of the data platform and the underlying (virtual) hardware resources provided by the cloud. In this paper, we present a neural network-based architecture where the system is constantly and automatically re-configured, particularly in terms of computing resources

Archivio della ricerca- Università di Roma La Sapienza

Chrion: Optimizing Recurrent Neural Network Inference by Collaboratively Utilizing CPUs and GPUs

Author: Cai Zinuo
Guan Haibing
Hua Yang
Ma Ruhui
Song Tao
Wang Hao
Publication venue
Publication date: 21/07/2023
Field of study

Deploying deep learning models in cloud clusters provides efficient and prompt inference services to accommodate the widespread application of deep learning. These clusters are usually equipped with host CPUs and accelerators with distinct responsibilities to handle serving requests, i.e. generalpurpose CPUs for input preprocessing and domain-specific GPUs for forward computation. Recurrent neural networks play an essential role in handling temporal inputs and display distinctive computation characteristics because of their high inter-operator parallelism. Hence, we propose Chrion to optimize recurrent neural network inference by collaboratively utilizing CPUs and GPUs. We formulate the model deployment in the CPU-GPU cluster as an NP-hard scheduling problem of directed acyclic graphs on heterogeneous devices. Given an input model in the ONNX format and user-defined SLO requirement, Chrion firstly preprocesses the model by model parsing and profiling, and then partitions the graph to select execution devices for each operator. When an online request arrives, Chrion performs forward computation according to the graph partition by executing the operators on the CPU and GPU in parallel. Our experimental results show that the execution time can be reduced by 19.4% at most in the latency-optimal pattern and GPU memory footprint by 67.5% in the memory-optimal pattern compared with the execution on the GPU

arXiv.org e-Print Archive