149 research outputs found

    MSF-Model: Modeling Metastable Failures in Replicated Storage Systems

    Full text link
    Metastable failure is a recent abstraction of a pattern of failures that occurs frequently in real-world distributed storage systems. In this paper, we propose a formal analysis and modeling of metastable failures in replicated storage systems. We focus on a foundational problem in distributed systems -- the problem of consensus -- to have an impact on a large class of systems. Our main contribution is the development of a queuing-based analytical model, MSF-Model, that can be used to characterize and predict metastable failures. MSF-Model integrates novel modeling concepts that allow modeling metastable failures which was interactable to model prior to our work. We also perform real experiments to reproduce and validate our model. Our real experiments show that MSF-Model predicts metastable failures with high accuracy by comparing the real experiment with the predictions from the queuing-based model

    LIPIcs, Volume 274, ESA 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 274, ESA 2023, Complete Volum

    Computer Aided Verification

    Get PDF
    This open access two-volume set LNCS 13371 and 13372 constitutes the refereed proceedings of the 34rd International Conference on Computer Aided Verification, CAV 2022, which was held in Haifa, Israel, in August 2022. The 40 full papers presented together with 9 tool papers and 2 case studies were carefully reviewed and selected from 209 submissions. The papers were organized in the following topical sections: Part I: Invited papers; formal methods for probabilistic programs; formal methods for neural networks; software Verification and model checking; hyperproperties and security; formal methods for hardware, cyber-physical, and hybrid systems. Part II: Probabilistic techniques; automata and logic; deductive verification and decision procedures; machine learning; synthesis and concurrency. This is an open access book

    Data-Driven Aircraft Assignment and Stochastic Models for Service Systems

    Get PDF
    This dissertation consists of two parts: data-driven aircraft assignment and stochastic models for service systems. In Part I, we propose a data-driven approach to reduce the delay propagation by optimizing the assignment between incoming and outgoing flights flown by an airline. There are two projects in this part. In the first project, we consider the aircraft assignment problem at a single airport. We propose a data-driven approach to estimate the assignment cost by considering covariates including scheduled arrival time, originating airport and aircraft type of the flights. We conclude that the stochastic assignment derived from this data-driven approach significantly outperforms the actual assignment. In the second project in this part, we extend the previous project to a network of airports by optimizing the assignment between incoming and outgoing flights at each airport in the network. We propose a similar data-driven approach to estimate the assignment costs at each airport, and show that our approach performs better than the benchmark policies. In Part II, we consider the stochastic models for service systems. There are two projects in this part as well. In the first project, we consider a joint staffing and admission control problem under minimal, partial and full information cases. We compare the profit under different information cases over the parameter space in detail. In the second project, we consider the joint admission and service rate control problem for a general reward structure under an unobservable (minimal information case) single server queueing system. We show that when the per unit service cost is less than or equal to a critical value, it is optimal to admit all the customers, otherwise, it is optimal to admit none. We show that this socially optimal policy induces the customers to behave in a socially optimal way with self-regulation.Doctor of Philosoph

    Online learning on the programmable dataplane

    Get PDF
    This thesis makes the case for managing computer networks with datadriven methods automated statistical inference and control based on measurement data and runtime observations—and argues for their tight integration with programmable dataplane hardware to make management decisions faster and from more precise data. Optimisation, defence, and measurement of networked infrastructure are each challenging tasks in their own right, which are currently dominated by the use of hand-crafted heuristic methods. These become harder to reason about and deploy as networks scale in rates and number of forwarding elements, but their design requires expert knowledge and care around unexpected protocol interactions. This makes tailored, per-deployment or -workload solutions infeasible to develop. Recent advances in machine learning offer capable function approximation and closed-loop control which suit many of these tasks. New, programmable dataplane hardware enables more agility in the network— runtime reprogrammability, precise traffic measurement, and low latency on-path processing. The synthesis of these two developments allows complex decisions to be made on previously unusable state, and made quicker by offloading inference to the network. To justify this argument, I advance the state of the art in data-driven defence of networks, novel dataplane-friendly online reinforcement learning algorithms, and in-network data reduction to allow classification of switchscale data. Each requires co-design aware of the network, and of the failure modes of systems and carried traffic. To make online learning possible in the dataplane, I use fixed-point arithmetic and modify classical (non-neural) approaches to take advantage of the SmartNIC compute model and make use of rich device local state. I show that data-driven solutions still require great care to correctly design, but with the right domain expertise they can improve on pathological cases in DDoS defence, such as protecting legitimate UDP traffic. In-network aggregation to histograms is shown to enable accurate classification from fine temporal effects, and allows hosts to scale such classification to far larger flow counts and traffic volume. Moving reinforcement learning to the dataplane is shown to offer substantial benefits to stateaction latency and online learning throughput versus host machines; allowing policies to react faster to fine-grained network events. The dataplane environment is key in making reactive online learning feasible—to port further algorithms and learnt functions, I collate and analyse the strengths of current and future hardware designs, as well as individual algorithms

    Flexible Automation and Intelligent Manufacturing: The Human-Data-Technology Nexus

    Get PDF
    This is an open access book. It gathers the first volume of the proceedings of the 31st edition of the International Conference on Flexible Automation and Intelligent Manufacturing, FAIM 2022, held on June 19 – 23, 2022, in Detroit, Michigan, USA. Covering four thematic areas including Manufacturing Processes, Machine Tools, Manufacturing Systems, and Enabling Technologies, it reports on advanced manufacturing processes, and innovative materials for 3D printing, applications of machine learning, artificial intelligence and mixed reality in various production sectors, as well as important issues in human-robot collaboration, including methods for improving safety. Contributions also cover strategies to improve quality control, supply chain management and training in the manufacturing industry, and methods supporting circular supply chain and sustainable manufacturing. All in all, this book provides academicians, engineers and professionals with extensive information on both scientific and industrial advances in the converging fields of manufacturing, production, and automation

    Dynamical Modeling of Cloud Applications for Runtime Performance Management

    Get PDF
    Cloud computing has quickly grown to become an essential component in many modern-day software applications. It allows consumers, such as a provider of some web service, to quickly and on demand obtain the necessary computational resources to run their applications. It is desirable for these service providers to keep the running cost of their cloud application low while adhering to various performance constraints. This is made difficult due to the dynamics imposed by, e.g., resource contentions or changing arrival rate of users, and the fact that there exist multiple ways of influencing the performance of a running cloud application. To facilitate decision making in this environment, performance models can be introduced that relate the workload and different actions to important performance metrics.In this thesis, such performance models of cloud applications are studied. In particular, we focus on modeling using queueing theory and on the fluid model for approximating the often intractable dynamics of the queue lengths. First, existing results on how the fluid model can be obtained from the mean-field approximation of a closed queueing network are simplified and extended to allow for mixed networks. The queues are allowed to follow the processor sharing or delay disciplines, and can have multiple classes with phase-type service times. An improvement to this fluid model is then presented to increase accuracy when the \emph{system size}, i.e., number of servers, initial population, and arrival rate, is small. Furthermore, a closed-form approximation of the response time CDF is presented. The methods are tested in a series of simulation experiments and shown to be accurate. This mean-field fluid model is then used to derive a general fluid model for microservices with interservice delays. The model is shown to be completely extractable at runtime in a distributed fashion. It is further evaluated on a simple microservice application and found to accurately predict important performance metrics in most cases. Furthermore, a method is devised to reduce the cost of a running application by tuning load balancing parameters between replicas. The method is built on gradient stepping by applying automatic differentiation to the fluid model. This allows for arbitrarily defined cost functions and constraints, most notably including different response time percentiles. The method is tested on a simple application distributed over multiple computing clusters and is shown to reduce costs while adhering to percentile constraints. Finally, modeling of request cloning is studied using the novel concept of synchronized service. This allows certain forms of cloning over servers, each modeled with a single queue, to be equivalently expressed as one single queue. The concept is very general regarding the involved queueing discipline and distributions, but instead introduces new, less realistic assumptions. How the equivalent queue model is affected by relaxing these assumptions is studied considering the processor sharing discipline, and an extension to enable modeling of speculative execution is made. In a simulation campaign, it is shown that these relaxations only has a minor effect in certain cases

    Proceedings of the 19th Sound and Music Computing Conference

    Get PDF
    Proceedings of the 19th Sound and Music Computing Conference - June 5-12, 2022 - Saint-Étienne (France). https://smc22.grame.f

    Stability Problems for Stochastic Models: Theory and Applications II

    Get PDF
    Most papers published in this Special Issue of Mathematics are written by the participants of the XXXVI International Seminar on Stability Problems for Stochastic Models, 21­25 June, 2021, Petrozavodsk, Russia. The scope of the seminar embraces the following topics: Limit theorems and stability problems; Asymptotic theory of stochastic processes; Stable distributions and processes; Asymptotic statistics; Discrete probability models; Characterization of probability distributions; Insurance and financial mathematics; Applied statistics; Queueing theory; and other fields. This Special Issue contains 12 papers by specialists who represent 6 countries: Belarus, France, Hungary, India, Italy, and Russia
    • 

    corecore