47,235 research outputs found

    Stateful data-parallel processing

    Get PDF
    Democratisation of data means that more people than ever are involved in the data analysis process. This is beneficial—it brings domain-specific knowledge from broad fields—but data scientists do not have adequate tools to write algorithms and execute them at scale. Processing models of current data-parallel processing systems, designed for scalability and fault tolerance, are stateless. Stateless processing facilitates capturing parallelisation opportunities and hides fault tolerance. However, data scientists want to write stateful programs—with explicit state that they can update, such as matrices in machine learning algorithms—and are used to imperative-style languages. These programs struggle to execute with high-performance in stateless data-parallel systems. Representing state explicitly makes data-parallel processing at scale challenging. To achieve scalability, state must be distributed and coordinated across machines. In the event of failures, state must be recovered to provide correct results. We introduce stateful data-parallel processing that addresses the previous challenges by: (i) representing state as a first-class citizen so that a system can manipulate it; (ii) introducing two distributed mutable state abstractions for scalability; and (iii) an integrated approach to scale out and fault tolerance that recovers large state—spanning the memory of multiple machines. To support imperative-style programs a static analysis tool analyses Java programs that manipulate state and translates them to a representation that can execute on SEEP, an implementation of a stateful data-parallel processing model. SEEP is evaluated with stateful Big Data applications and shows comparable or better performance than state-of-the-art stateless systems.Open Acces

    Detection of faults and software reliability analysis

    Get PDF
    Multi-version or N-version programming is proposed as a method of providing fault tolerance in software. The approach requires the separate, independent preparation of multiple versions of a piece of software for some application. These versions are executed in parallel in the application environment; each receives identical inputs and each produces its version of the required outputs. The outputs are collected by a voter and, in principle, they should all be the same. In practice there may be some disagreement. If this occurs, the results of the majority are taken to be the correct output, and that is the output used by the system. A total of 27 programs were produced. Each of these programs was then subjected to one million randomly-generated test cases. The experiment yielded a number of programs containing faults that are useful for general studies of software reliability as well as studies of N-version programming. Fault tolerance through data diversity and analytic models of comparison testing are discussed

    Making State Explicit for Imperative Big Data Processing

    Get PDF
    Data scientists often implement machine learning algorithms in imperative languages such as Java, Matlab and R. Yet such implementations fail to achieve the performance and scalability of specialised data-parallel processing frameworks. Our goal is to execute imperative Java programs in a data-parallel fashion with high throughput and low latency. This raises two challenges: how to support the arbitrary mutable state of Java programs without compromising scalability, and how to re cover that state after failure with low overhead. Our idea is to infer the dataflow and the types of state accesses from a Java program and use this information to generate a stateful dataflow graph (SDG) . By explicitly separating data from mutablestate, SDGs have specific features to enable this translation: to ensure scalability, distributed state can be partitioned across nodes if computation can occur entirely in parallel; if this is not possible, partial state gives nodes local instances for independent computation, which are reconciled according to application semantics. For fault tolerance, large inmemory state is checkpointed asynchronously without global coordination. We show that the performance of SDGs for several imperative online applications matches that of existing data-parallel processing frameworks

    Fault Tolerance for High-Performance Applications Using Structured Parallelism Models

    Get PDF
    In the last years parallel computing has increasingly exploited the high-level models of structured parallel programming, an example of which are algorithmic skeletons. This trend has been motivated by the properties featuring structured parallelism models, which can be used to derive several (static and dynamic) optimizations at various implementation levels. In this thesis we study the properties of structured parallel models useful for attacking the issue of providing a fault tolerance support oriented towards High-Performance applications. This issue has been traditionally faced in two ways: (i) in the context of unstructured parallelism models (e.g. MPI), which computation model is essentially based on a distributed set of processes communicating through message-passing, with an approach based on checkpointing and rollback recovery or software replication; (ii) in the context of high-level models, based on a specific parallelism model (e.g. data-flow) and/or an implementation model (e.g. master-slave), by introducing specific techniques based on the properties of the programming and computation models themselves. In this thesis we make a step towards a more abstract viewpoint and we highlight the properties of structured parallel models interesting for fault tolerance purposes. We consider two classes of parallel programs (namely task parallel and data parallel) and we introduce a fault tolerance support based on checkpointing and rollback recovery. The support is derived according to the high-level properties of the parallel models: we call this derivation specialization of fault tolerance techniques, highlighting the difference with classical solutions supporting structure-unaware computations. As a consequence of this specialization, the introduced fault tolerance techniques can be configured and optimized to meet specific needs at different implementation levels. That is, the supports we present do not target a single computing platform or a specific class of them. Indeed the specializations are the mechanism to target specific issues of the exploited environment and of the implemented applications, as proper choices of the protocols and their configurations

    Algorithmic Based Fault Tolerance Applied to High Performance Computing

    Full text link
    We present a new approach to fault tolerance for High Performance Computing system. Our approach is based on a careful adaptation of the Algorithmic Based Fault Tolerance technique (Huang and Abraham, 1984) to the need of parallel distributed computation. We obtain a strongly scalable mechanism for fault tolerance. We can also detect and correct errors (bit-flip) on the fly of a computation. To assess the viability of our approach, we have developed a fault tolerant matrix-matrix multiplication subroutine and we propose some models to predict its running time. Our parallel fault-tolerant matrix-matrix multiplication scores 1.4 TFLOPS on 484 processors (cluster jacquard.nersc.gov) and returns a correct result while one process failure has happened. This represents 65% of the machine peak efficiency and less than 12% overhead with respect to the fastest failure-free implementation. We predict (and have observed) that, as we increase the processor count, the overhead of the fault tolerance drops significantly

    Parallelizing Deadlock Resolution in Symbolic Synthesis of Distributed Programs

    Full text link
    Previous work has shown that there are two major complexity barriers in the synthesis of fault-tolerant distributed programs: (1) generation of fault-span, the set of states reachable in the presence of faults, and (2) resolving deadlock states, from where the program has no outgoing transitions. Of these, the former closely resembles with model checking and, hence, techniques for efficient verification are directly applicable to it. Hence, we focus on expediting the latter with the use of multi-core technology. We present two approaches for parallelization by considering different design choices. The first approach is based on the computation of equivalence classes of program transitions (called group computation) that are needed due to the issue of distribution (i.e., inability of processes to atomically read and write all program variables). We show that in most cases the speedup of this approach is close to the ideal speedup and in some cases it is superlinear. The second approach uses traditional technique of partitioning deadlock states among multiple threads. However, our experiments show that the speedup for this approach is small. Consequently, our analysis demonstrates that a simple approach of parallelizing the group computation is likely to be the effective method for using multi-core computing in the context of deadlock resolution
    • …
    corecore