24,441 research outputs found

    Managing Response Time in a Call-Routing Problem with Service Failure

    Full text link

    A peer-to-peer infrastructure for resilient web services

    Get PDF
    This work is funded by GR/M78403 “Supporting Internet Computation in Arbitrary Geographical Locations” and GR/R51872 “Reflective Application Framework for Distributed Architectures”, and by Nuffield Grant URB/01597/G “Peer-to-Peer Infrastructure for Autonomic Storage Architectures”This paper describes an infrastructure for the deployment and use of Web Services that are resilient to the failure of the nodes that host those services. The infrastructure presents a single interface that provides mechanisms for users to publish services and to find hosted services. The infrastructure supports the autonomic deployment of services and the brokerage of hosts on which services may be deployed. Once deployed, services are autonomically managed in a number of aspects including load balancing, availability, failure detection and recovery, and lifetime management. Services are published and deployed with associated metadata describing the service type. This same metadata may be used subsequently by interested parties to discover services. The infrastructure uses peer-to-peer (P2P) overlay technologies to abstract over the underlying network to deploy and locate instances of those services. It takes advantage of the P2P network to replicate directory services used to locate service instances (for using a service), Service Hosts (for deployment of services) and Autonomic Managers which manage the deployed services. The P2P overlay network is itself constructed using novel Web Services-based middleware and a variation of the Chord P2P protocol, which is self-managing.Postprin

    A Peer-to-Peer Middleware Framework for Resilient Persistent Programming

    Get PDF
    The persistent programming systems of the 1980s offered a programming model that integrated computation and long-term storage. In these systems, reliable applications could be engineered without requiring the programmer to write translation code to manage the transfer of data to and from non-volatile storage. More importantly, it simplified the programmer's conceptual model of an application, and avoided the many coherency problems that result from multiple cached copies of the same information. Although technically innovative, persistent languages were not widely adopted, perhaps due in part to their closed-world model. Each persistent store was located on a single host, and there were no flexible mechanisms for communication or transfer of data between separate stores. Here we re-open the work on persistence and combine it with modern peer-to-peer techniques in order to provide support for orthogonal persistence in resilient and potentially long-running distributed applications. Our vision is of an infrastructure within which an application can be developed and distributed with minimal modification, whereupon the application becomes resilient to certain failure modes. If a node, or the connection to it, fails during execution of the application, the objects are re-instantiated from distributed replicas, without their reference holders being aware of the failure. Furthermore, we believe that this can be achieved within a spectrum of application programmer intervention, ranging from minimal to totally prescriptive, as desired. The same mechanisms encompass an orthogonally persistent programming model. We outline our approach to implementing this vision, and describe current progress.Comment: Submitted to EuroSys 200

    A recursive approach to network management

    Full text link
    Nowadays there is an increasing need for a general management paradigm which can simplify network management and further enable network innovations. In this paper, in response to limitations of current Software Defined Networking (SDN) management solutions, we propose a recursive approach to enterprise network management, where network management is done through managing various Virtual Transport Networks (VTNs). Different from the traditional virtual network model which mainly focuses on routing/tunneling, our VTN provides communication service with explicit Quality-of-Service (QoS) support for applications via transport flows, and it involves all mechanisms (e:g:, routing, addressing, error and flow control, resource allocation) needed to support such transport flows. Based on this approach, we design and implement a management layer, which recurses the same VTN-based management mechanism for enterprise network management. Comparing with an SDN-based management approach, our experimental results show that our management layer achieves better network performance

    Proactive cloud management for highly heterogeneous multi-cloud infrastructures

    Get PDF
    Various literature studies demonstrated that the cloud computing paradigm can help to improve availability and performance of applications subject to the problem of software anomalies. Indeed, the cloud resource provisioning model enables users to rapidly access new processing resources, even distributed over different geographical regions, that can be promptly used in the case of, e.g., crashes or hangs of running machines, as well as to balance the load in the case of overloaded machines. Nevertheless, managing a complex geographically-distributed cloud deploy could be a complex and time-consuming task. Autonomic Cloud Manager (ACM) Framework is an autonomic framework for supporting proactive management of applications deployed over multiple cloud regions. It uses machine learning models to predict failures of virtual machines and to proactively redirect the load to healthy machines/cloud regions. In this paper, we study different policies to perform efficient proactive load balancing across cloud regions in order to mitigate the effect of software anomalies. These policies use predictions about the mean time to failure of virtual machines. We consider the case of heterogeneous cloud regions, i.e regions with different amount of resources, and we provide an experimental assessment of these policies in the context of ACM Framework

    Unattended network operations technology assessment study. Technical support for defining advanced satellite systems concepts

    Get PDF
    The results are summarized of an unattended network operations technology assessment study for the Space Exploration Initiative (SEI). The scope of the work included: (1) identified possible enhancements due to the proposed Mars communications network; (2) identified network operations on Mars; (3) performed a technology assessment of possible supporting technologies based on current and future approaches to network operations; and (4) developed a plan for the testing and development of these technologies. The most important results obtained are as follows: (1) addition of a third Mars Relay Satellite (MRS) and MRS cross link capabilities will enhance the network's fault tolerance capabilities through improved connectivity; (2) network functions can be divided into the six basic ISO network functional groups; (3) distributed artificial intelligence technologies will augment more traditional network management technologies to form the technological infrastructure of a virtually unattended network; and (4) a great effort is required to bring the current network technology levels for manned space communications up to the level needed for an automated fault tolerance Mars communications network

    Multi-layer virtual transport network management

    Full text link
    Nowadays there is an increasing need for a general paradigm which can simplify network management and further enable network innovations. Software Defined Networking (SDN) is an efficient way to make the network programmable and reduce management complexity, however it is plagued with limitations inherited from the legacy Internet (TCP/IP) architecture. In this paper, in response to limitations of current Software Defined Networking (SDN) management solutions, we propose a recursive approach to enterprise network management, where network management is done through managing various Virtual Transport Networks (VTNs) over different scopes (i.e., regions of operation). Different from the traditional virtual network model which mainly focuses on routing/tunneling, our VTN provides communication service with explicit Quality-of-Service (QoS) support for applications via transport flows, and it involves all mechanisms (e.g., addressing, routing, error and flow control, resource allocation) needed to support such transport flows. Based on this approach, we design and implement a management architecture, which recurses the same VTN-based management mechanism for enterprise network management. Our experimental results show that our management architecture achieves better performance.National Science Foundation awards: CNS-0963974 and CNS-1346688

    Resilience markers for safer systems and organisations

    Get PDF
    If computer systems are to be designed to foster resilient performance it is important to be able to identify contributors to resilience. The emerging practice of Resilience Engineering has identified that people are still a primary source of resilience, and that the design of distributed systems should provide ways of helping people and organisations to cope with complexity. Although resilience has been identified as a desired property, researchers and practitioners do not have a clear understanding of what manifestations of resilience look like. This paper discusses some examples of strategies that people can adopt that improve the resilience of a system. Critically, analysis reveals that the generation of these strategies is only possible if the system facilitates them. As an example, this paper discusses practices, such as reflection, that are known to encourage resilient behavior in people. Reflection allows systems to better prepare for oncoming demands. We show that contributors to the practice of reflection manifest themselves at different levels of abstraction: from individual strategies to practices in, for example, control room environments. The analysis of interaction at these levels enables resilient properties of a system to be ‘seen’, so that systems can be designed to explicitly support them. We then present an analysis of resilience at an organisational level within the nuclear domain. This highlights some of the challenges facing the Resilience Engineering approach and the need for using a collective language to articulate knowledge of resilient practices across domains
    • 

    corecore