3,030 research outputs found

    Tolerating Correlated Failures in Massively Parallel Stream Processing Engines

    Full text link
    Fault-tolerance techniques for stream processing engines can be categorized into passive and active approaches. A typical passive approach periodically checkpoints a processing task's runtime states and can recover a failed task by restoring its runtime state using its latest checkpoint. On the other hand, an active approach usually employs backup nodes to run replicated tasks. Upon failure, the active replica can take over the processing of the failed task with minimal latency. However, both approaches have their own inadequacies in Massively Parallel Stream Processing Engines (MPSPE). The passive approach incurs a long recovery latency especially when a number of correlated nodes fail simultaneously, while the active approach requires extra replication resources. In this paper, we propose a new fault-tolerance framework, which is Passive and Partially Active (PPA). In a PPA scheme, the passive approach is applied to all tasks while only a selected set of tasks will be actively replicated. The number of actively replicated tasks depends on the available resources. If tasks without active replicas fail, tentative outputs will be generated before the completion of the recovery process. We also propose effective and efficient algorithms to optimize a partially active replication plan to maximize the quality of tentative outputs. We implemented PPA on top of Storm, an open-source MPSPE and conducted extensive experiments using both real and synthetic datasets to verify the effectiveness of our approach

    Distributed Management of Massive Data: an Efficient Fine-Grain Data Access Scheme

    Get PDF
    This paper addresses the problem of efficiently storing and accessing massive data blocks in a large-scale distributed environment, while providing efficient fine-grain access to data subsets. This issue is crucial in the context of applications in the field of databases, data mining and multimedia. We propose a data sharing service based on distributed, RAM-based storage of data, while leveraging a DHT-based, natively parallel metadata management scheme. As opposed to the most commonly used grid storage infrastructures that provide mechanisms for explicit data localization and transfer, we provide a transparent access model, where data are accessed through global identifiers. Our proposal has been validated through a prototype implementation whose preliminary evaluation provides promising results

    Parallel Architectures for Planetary Exploration Requirements (PAPER)

    Get PDF
    The Parallel Architectures for Planetary Exploration Requirements (PAPER) project is essentially research oriented towards technology insertion issues for NASA's unmanned planetary probes. It was initiated to complement and augment the long-term efforts for space exploration with particular reference to NASA/LaRC's (NASA Langley Research Center) research needs for planetary exploration missions of the mid and late 1990s. The requirements for space missions as given in the somewhat dated Advanced Information Processing Systems (AIPS) requirements document are contrasted with the new requirements from JPL/Caltech involving sensor data capture and scene analysis. It is shown that more stringent requirements have arisen as a result of technological advancements. Two possible architectures, the AIPS Proof of Concept (POC) configuration and the MAX Fault-tolerant dataflow multiprocessor, were evaluated. The main observation was that the AIPS design is biased towards fault tolerance and may not be an ideal architecture for planetary and deep space probes due to high cost and complexity. The MAX concepts appears to be a promising candidate, except that more detailed information is required. The feasibility for adding neural computation capability to this architecture needs to be studied. Key impact issues for architectural design of computing systems meant for planetary missions were also identified

    Advanced flight control system study

    Get PDF
    A fly by wire flight control system architecture designed for high reliability includes spare sensor and computer elements to permit safe dispatch with failed elements, thereby reducing unscheduled maintenance. A methodology capable of demonstrating that the architecture does achieve the predicted performance characteristics consists of a hierarchy of activities ranging from analytical calculations of system reliability and formal methods of software verification to iron bird testing followed by flight evaluation. Interfacing this architecture to the Lockheed S-3A aircraft for flight test is discussed. This testbed vehicle can be expanded to support flight experiments in advanced aerodynamics, electromechanical actuators, secondary power systems, flight management, new displays, and air traffic control concepts

    Storage and Search in Dynamic Peer-to-Peer Networks

    Full text link
    We study robust and efficient distributed algorithms for searching, storing, and maintaining data in dynamic Peer-to-Peer (P2P) networks. P2P networks are highly dynamic networks that experience heavy node churn (i.e., nodes join and leave the network continuously over time). Our goal is to guarantee, despite high node churn rate, that a large number of nodes in the network can store, retrieve, and maintain a large number of data items. Our main contributions are fast randomized distributed algorithms that guarantee the above with high probability (whp) even under high adversarial churn: 1. A randomized distributed search algorithm that (whp) guarantees that searches from as many as no(n)n - o(n) nodes (nn is the stable network size) succeed in O(logn){O}(\log n)-rounds despite O(n/log1+δn){O}(n/\log^{1+\delta} n) churn, for any small constant δ>0\delta > 0, per round. We assume that the churn is controlled by an oblivious adversary (that has complete knowledge and control of what nodes join and leave and at what time, but is oblivious to the random choices made by the algorithm). 2. A storage and maintenance algorithm that guarantees (whp) data items can be efficiently stored (with only Θ(logn)\Theta(\log{n}) copies of each data item) and maintained in a dynamic P2P network with churn rate up to O(n/log1+δn){O}(n/\log^{1+\delta} n) per round. Our search algorithm together with our storage and maintenance algorithm guarantees that as many as no(n)n - o(n) nodes can efficiently store, maintain, and search even under O(n/log1+δn){O}(n/\log^{1+\delta} n) churn per round. Our algorithms require only polylogarithmic in nn bits to be processed and sent (per round) by each node. To the best of our knowledge, our algorithms are the first-known, fully-distributed storage and search algorithms that provably work under highly dynamic settings (i.e., high churn rates per step).Comment: to appear at SPAA 201

    An Arbitrary 2D Structured Replica Control Protocol

    Get PDF
    Traditional replication protocols that logically arrange the replicas into a specific structure have reasonable availability, lower communication cost as well as system load than those that do not require any logical organisation of replicas. We propose in this paper the A2DS protocol: a single protocol that, unlike the existing proposed protocols, can be adapted to any 2D structure. Its read operation is carried out on any replica of every level of the structure whereas write operations are performed on all replicas of a single level of the structure. We present several basic 2D structures and introduce the new idea of obtaining other 2D structures by the composition of several basic ones. Two structures are proposed that have near optimal performance in terms of the communication cost, availability and system load of their read and write operations. Also, we introduce a new protocol that provides better performance for its write operations than those of ROWA protocol while preserving similar read performance
    corecore