6 research outputs found
MSF-Model: Modeling Metastable Failures in Replicated Storage Systems
Metastable failure is a recent abstraction of a pattern of failures that
occurs frequently in real-world distributed storage systems. In this paper, we
propose a formal analysis and modeling of metastable failures in replicated
storage systems. We focus on a foundational problem in distributed systems --
the problem of consensus -- to have an impact on a large class of systems. Our
main contribution is the development of a queuing-based analytical model,
MSF-Model, that can be used to characterize and predict metastable failures.
MSF-Model integrates novel modeling concepts that allow modeling metastable
failures which was interactable to model prior to our work. We also perform
real experiments to reproduce and validate our model. Our real experiments show
that MSF-Model predicts metastable failures with high accuracy by comparing the
real experiment with the predictions from the queuing-based model
PlinyCompute: A Platform for High-Performance, Distributed, Data-Intensive Tool Development
This paper describes PlinyCompute, a system for development of
high-performance, data-intensive, distributed computing tools and libraries. In
the large, PlinyCompute presents the programmer with a very high-level,
declarative interface, relying on automatic, relational-database style
optimization to figure out how to stage distributed computations. However, in
the small, PlinyCompute presents the capable systems programmer with a
persistent object data model and API (the "PC object model") and associated
memory management system that has been designed from the ground-up for high
performance, distributed, data-intensive computing. This contrasts with most
other Big Data systems, which are constructed on top of the Java Virtual
Machine (JVM), and hence must at least partially cede performance-critical
concerns such as memory management (including layout and de/allocation) and
virtual method/function dispatch to the JVM. This hybrid approach---declarative
in the large, trusting the programmer's ability to utilize PC object model
efficiently in the small---results in a system that is ideal for the
development of reusable, data-intensive tools and libraries. Through extensive
benchmarking, we show that implementing complex objects manipulation and
non-trivial, library-style computations on top of PlinyCompute can result in a
speedup of 2x to more than 50x or more compared to equivalent implementations
on Spark.Comment: 48 pages, including references and Appendi
IPA: Inference Pipeline Adaptation to Achieve High Accuracy and Cost-Efficiency
Efficiently optimizing multi-model inference pipelines for fast, accurate,
and cost-effective inference is a crucial challenge in ML production systems,
given their tight end-to-end latency requirements. To simplify the exploration
of the vast and intricate trade-off space of accuracy and cost in inference
pipelines, providers frequently opt to consider one of them. However, the
challenge lies in reconciling accuracy and cost trade-offs. To address this
challenge and propose a solution to efficiently manage model variants in
inference pipelines, we present IPA, an online deep-learning Inference Pipeline
Adaptation system that efficiently leverages model variants for each deep
learning task. Model variants are different versions of pre-trained models for
the same deep learning task with variations in resource requirements, latency,
and accuracy. IPA dynamically configures batch size, replication, and model
variants to optimize accuracy, minimize costs, and meet user-defined latency
SLAs using Integer Programming. It supports multi-objective settings for
achieving different trade-offs between accuracy and cost objectives while
remaining adaptable to varying workloads and dynamic traffic patterns.
Extensive experiments on a Kubernetes implementation with five real-world
inference pipelines demonstrate that IPA improves normalized accuracy by up to
35% with a minimal cost increase of less than 5%
Keep It Simple: Fault Tolerance Evaluation of Federated Learning with Unreliable Clients
Federated learning (FL), as an emerging artificial intelligence (AI)
approach, enables decentralized model training across multiple devices without
exposing their local training data. FL has been increasingly gaining popularity
in both academia and industry. While research works have been proposed to
improve the fault tolerance of FL, the real impact of unreliable devices (e.g.,
dropping out, misconfiguration, poor data quality) in real-world applications
is not fully investigated. We carefully chose two representative, real-world
classification problems with a limited numbers of clients to better analyze FL
fault tolerance. Contrary to the intuition, simple FL algorithms can perform
surprisingly well in the presence of unreliable clients
Recommended from our members
[Solution] IPA: Inference Pipeline Adaptation to achieve high accuracy and cost-efficiency
Efficiently optimizing multi-model inference pipelines for fast, accurate, and cost-effective inference is a crucial challenge in machine learning production systems, given their tight end-to-end latency requirements. To simplify the exploration of the vast and intricate trade-off space of latency, accuracy, and cost in inference pipelines, providers frequently opt to consider one of them. However, the challenge lies in reconciling latency, accuracy, and cost trade-offs. To address this challenge and propose a solution to efficiently manage model variants in inference pipelines, we present IPA, an online deep learning Inference Pipeline Adaptation system that efficiently leverages model variants for each deep learning task. Model variants are different versions of pre-trained models for the same deep learning task with variations in resource requirements, latency, and accuracy. IPA dynamically configures batch size, replication, and model variants to optimize accuracy, minimize costs, and meet user-defined latency Service Level Agreements (SLAs) using Integer Programming. It supports multi-objective settings for achieving different trade-offs between accuracy and cost objectives while remaining adaptable to varying workloads and dynamic traffic patterns. Navigating a wider variety of configurations allows IPA to achieve better trade-offs between cost and accuracy objectives compared to existing methods. Extensive experiments in a Kubernetes implementation with five real-world inference pipelines demonstrate that IPA improves end-to-end accuracy by up to 21% with a minimal cost increase. The code and data for replications are available at https: //github.com/reconfigurable-ml-pipeline/ipa