1,531 research outputs found
Architectural Vision for Quantum Computing in the Edge-Cloud Continuum
Quantum processing units (QPUs) are currently exclusively available from
cloud vendors. However, with recent advancements, hosting QPUs is soon possible
everywhere. Existing work has yet to draw from research in edge computing to
explore systems exploiting mobile QPUs, or how hybrid applications can benefit
from distributed heterogeneous resources. Hence, this work presents an
architecture for Quantum Computing in the edge-cloud continuum. We discuss the
necessity, challenges, and solution approaches for extending existing work on
classical edge computing to integrate QPUs. We describe how warm-starting
allows defining workflows that exploit the hierarchical resources spread across
the continuum. Then, we introduce a distributed inference engine with hybrid
classical-quantum neural networks (QNNs) to aid system designers in
accommodating applications with complex requirements that incur the highest
degree of heterogeneity. We propose solutions focusing on classical layer
partitioning and quantum circuit cutting to demonstrate the potential of
utilizing classical and quantum computation across the continuum. To evaluate
the importance and feasibility of our vision, we provide a proof of concept
that exemplifies how extending a classical partition method to integrate
quantum circuits can improve the solution quality. Specifically, we implement a
split neural network with optional hybrid QNN predictors. Our results show that
extending classical methods with QNNs is viable and promising for future work.Comment: 16 pages, 5 figures, Vision Pape
ChainsFormer: A Chain Latency-aware Resource Provisioning Approach for Microservices Cluster
The trend towards transitioning from monolithic applications to microservices
has been widely embraced in modern distributed systems and applications. This
shift has resulted in the creation of lightweight, fine-grained, and
self-contained microservices. Multiple microservices can be linked together via
calls and inter-dependencies to form complex functions. One of the challenges
in managing microservices is provisioning the optimal amount of resources for
microservices in the chain to ensure application performance while improving
resource usage efficiency. This paper presents ChainsFormer, a framework that
analyzes microservice inter-dependencies to identify critical chains and nodes,
and provision resources based on reinforcement learning. To analyze chains,
ChainsFormer utilizes light-weight machine learning techniques to address the
dynamic nature of microservice chains and workloads. For resource provisioning,
a reinforcement learning approach is used that combines vertical and horizontal
scaling to determine the amount of allocated resources and the number of
replicates. We evaluate the effectiveness of ChainsFormer using realistic
applications and traces on a real testbed based on Kubernetes. Our experimental
results demonstrate that ChainsFormer can reduce response time by up to 26% and
improve processed requests per second by 8% compared with state-of-the-art
techniques.Comment: 15 page
Providing Transaction Class-Based QoS in In-Memory Data Grids via Machine Learning
Elastic architectures and the ”pay-as-you-go” resource pricing model offered by many cloud infrastructure providers may seem the right choice for companies dealing with data centric applications characterized by high variable workload. In such a context, in-memory transactional data grids have demonstrated to be particularly suited for exploiting advantages provided by elastic computing platforms, mainly thanks to their ability to be dynamically (re-)sized and tuned. Anyway, when specific QoS requirements have to be met, this kind of architectures have revealed to be complex to be managed by humans. Particularly, their management is a very complex task without the stand of mechanisms supporting run-time automatic sizing/tuning of the data platform and the underlying (virtual) hardware resources provided by the cloud. In this paper, we present a neural network-based architecture where the system is constantly and automatically re-configured, particularly in terms of computing resources
Chrion: Optimizing Recurrent Neural Network Inference by Collaboratively Utilizing CPUs and GPUs
Deploying deep learning models in cloud clusters provides efficient and
prompt inference services to accommodate the widespread application of deep
learning. These clusters are usually equipped with host CPUs and accelerators
with distinct responsibilities to handle serving requests, i.e. generalpurpose
CPUs for input preprocessing and domain-specific GPUs for forward computation.
Recurrent neural networks play an essential role in handling temporal inputs
and display distinctive computation characteristics because of their high
inter-operator parallelism. Hence, we propose Chrion to optimize recurrent
neural network inference by collaboratively utilizing CPUs and GPUs. We
formulate the model deployment in the CPU-GPU cluster as an NP-hard scheduling
problem of directed acyclic graphs on heterogeneous devices. Given an input
model in the ONNX format and user-defined SLO requirement, Chrion firstly
preprocesses the model by model parsing and profiling, and then partitions the
graph to select execution devices for each operator. When an online request
arrives, Chrion performs forward computation according to the graph partition
by executing the operators on the CPU and GPU in parallel. Our experimental
results show that the execution time can be reduced by 19.4% at most in the
latency-optimal pattern and GPU memory footprint by 67.5% in the memory-optimal
pattern compared with the execution on the GPU
- …