2,701 research outputs found

    Modular Workflow Engine for Distributed Services using Lightweight Java Clients

    Full text link
    In this article we introduce the concept and the first implementation of a lightweight client-server-framework as middleware for distributed computing. On the client side an installation without administrative rights or privileged ports can turn any computer into a worker node. Only a Java runtime environment and the JAR files comprising the workflow client are needed. To connect all clients to the engine one open server port is sufficient. The engine submits data to the clients and orchestrates their work by workflow descriptions from a central database. Clients request new task descriptions periodically, thus the system is robust against network failures. In the basic set-up, data up- and downloads are handled via HTTP communication with the server. The performance of the modular system could additionally be improved using dedicated file servers or distributed network file systems. We demonstrate the design features of the proposed engine in real-world applications from mechanical engineering. We have used this system on a compute cluster in design-of-experiment studies, parameter optimisations and robustness validations of finite element structures.Comment: 14 pages, 8 figure

    Formal Configuration of Fault-Tolerant Systems

    Get PDF
    Bit flips are known to be a source of strange system behavior, failures, and crashes. They can cause dramatic financial loss, security breaches, or even harm human life. Caused by energized particles arising from, e.g., cosmic rays or heat, they are hardly avoidable. Due to transistor sizes becoming smaller and smaller, modern hardware becomes more and more prone to bit flips. This yields a high scientific interest, and many techniques to make systems more resilient against bit flips are developed. Fault-tolerance techniques are techniques that detect and react to bit flips or their effects. Before using these techniques, they typically need to be configured for the particular system they shall protect, the grade of resilience that shall be achieved, and the environment. State-of-the-art configuration approaches have a high risk of being imprecise, of being affected by undesired side effects, and of yielding questionable resilience measures. In this thesis we encourage the usage of formal methods for resiliency configuration, point out advantages and investigate difficulties. We exemplarily investigate two systems that are equipped with fault-tolerance techniques, and we apply parametric variants of probabilistic model checking to obtain optimal configurations for pre-defined resilience criteria. Probabilistic model checking is an automated formal method that operates on Markov models, i.e., state-based models with probabilistic transitions, where costs or rewards can be assigned to states and transitions. Probabilistic model checking can be used to compute, e.g., the probability of having a failure, the conditional probability of detecting an error in case of bit-flip occurrence, or the overhead that arises due to error detection and correction. Parametric variants of probabilistic model checking allow parameters in the transition probabilities and in the costs and rewards. Instead of computing values for probabilities and overhead, parametric variants compute rational functions. These functions can then be analyzed for optimality. The considered fault-tolerant systems are inspired by the work of project partners. The first system is an inter-process communication protocol as it is used in the Fiasco.OC microkernel. The communication structures provided by the kernel are protected against bit flips by a fault-tolerance technique. The second system is inspired by the redo-based fault-tolerance technique \haft. This technique protects an application against bit flips by partitioning the application's instruction flow into transaction, adding redundance, and redoing single transactions in case of error detection. Driven by these examples, we study challenges when using probabilistic model checking for fault-tolerance configuration and present solutions. We show that small transition probabilities, as they arise in error models, can be a cause of previously known accuracy issues, when using numeric solver in probabilistic model checking. We argue that the use of non-iterative methods is an acceptable alternative. We debate on the usability of the rational functions for finding optimal configurations, and show that for relatively short rational functions the usage of mathematical methods is appropriate. The redo-based fault-tolerance model suffers from the well-known state-explosion problem. We present a new technique, counter-based factorization, that tackles this problem for system models that do not scale because of a counter, as it is the case for this fault-tolerance model. This technique utilizes the chain-like structure that arises from the counter, splits the model into several parts, and computes local characteristics (in terms of rational functions) for these parts. These local characteristics can then be combined to retrieve global resiliency and overhead measures. The rational functions retrieved for the redo-based fault-tolerance model are huge - for small model instances they already have the size of more than one gigabyte. We therefor can not apply precise mathematic methods to these functions. Instead, we use the short, matrix-based representation, that arises from factorization, to point-wise evaluate the functions. Using this approach, we systematically explore the design space of the redo-based fault-tolerance model and retrieve sweet-spot configurations

    On Synchronous and Asynchronous Monitor Instrumentation for Actor-based systems

    Full text link
    We study the impact of synchronous and asynchronous monitoring instrumentation on runtime overheads in the context of a runtime verification framework for actor-based systems. We show that, in such a context, asynchronous monitoring incurs substantially lower overhead costs. We also show how, for certain properties that require synchronous monitoring, a hybrid approach can be used that ensures timely violation detections for the important events while, at the same time, incurring lower overhead costs that are closer to those of an asynchronous instrumentation.Comment: In Proceedings FOCLASA 2014, arXiv:1502.0315

    Open Source Verification under a Cloud

    Get PDF
    An experiment in providing volunteer cloud computing support for automated audits of open source code is described here, along with the supporting theory. Certification and the distributed and piecewise nature of the underlying verification computation are among the areas formalised in the theory part. The eventual aim of this research is to provide a means for open source developers who seek formally backed certification for their project to run fully automated analyses on their own source code. In order to ensure that the results are not tampered with, the computation is anonymized and shared with an ad-hoc network of volunteer CPUs for incremental completion. Each individual computation is repeated many times at different sites, and sufficient accounting data is generated to allow each computation to be refuted
    • …
    corecore