768 research outputs found

    Reliable Messaging to Millions of Users with MigratoryData

    Full text link
    Web-based notification services are used by a large range of businesses to selectively distribute live updates to customers, following the publish/subscribe (pub/sub) model. Typical deployments can involve millions of subscribers expecting ordering and delivery guarantees together with low latencies. Notification services must be vertically and horizontally scalable, and adopt replication to provide a reliable service. We report our experience building and operating MigratoryData, a highly-scalable notification service. We discuss the typical requirements of MigratoryData customers, and describe the architecture and design of the service, focusing on scalability and fault tolerance. Our evaluation demonstrates the ability of MigratoryData to handle millions of concurrent connections and support a reliable notification service despite server failures and network disconnections

    Automatic Reconfiguration for Large-Scale Reliable Storage Systems

    Get PDF
    Byzantine-fault-tolerant replication enhances the availability and reliability of Internet services that store critical state and preserve it despite attacks or software errors. However, existing Byzantine-fault-tolerant storage systems either assume a static set of replicas, or have limitations in how they handle reconfigurations (e.g., in terms of the scalability of the solutions or the consistency levels they provide). This can be problematic in long-lived, large-scale systems where system membership is likely to change during the system lifetime. In this paper, we present a complete solution for dynamically changing system membership in a large-scale Byzantine-fault-tolerant system. We present a service that tracks system membership and periodically notifies other system nodes of membership changes. The membership service runs mostly automatically, to avoid human configuration errors; is itself Byzantine-fault-tolerant and reconfigurable; and provides applications with a sequence of consistent views of the system membership. We demonstrate the utility of this membership service by using it in a novel distributed hash table called dBQS that provides atomic semantics even across changes in replica sets. dBQS is interesting in its own right because its storage algorithms extend existing Byzantine quorum protocols to handle changes in the replica set, and because it differs from previous DHTs by providing Byzantine fault tolerance and offering strong semantics. We implemented the membership service and dBQS. Our results show that the approach works well, in practice: the membership service is able to manage a large system and the cost to change the system membership is low

    Extended Fault Taxonomy of SOA-Based Systems

    Get PDF
    Service Oriented Architecture (SOA) is considered as a standard for enterprise software development. The main characteristics of SOA are dynamic discovery and composition of software services in a heterogeneous environment. These properties pose newer challenges in fault management of SOA-based systems (SBS). A proper understanding of different faults in an SBS is very necessary for effective fault handling. A comprehensive three-fold fault taxonomy is presented here that covers distributed, SOA specific and non-functional faults in a holistic manner. A comprehensive fault taxonomy is a key starting point for providing techniques and methods for accessing the quality of a given system. In this paper, an attempt has been made to outline several SBSs faults into a well-structured taxonomy that may assist developers to plan suitable fault repairing strategies. Some commonly emphasized fault recovery strategies are also discussed. Some challenges that may occur during fault handling of SBSs are also mentioned
    corecore