Search CORE

798 research outputs found

Programming your way out of the past: ISIS and the META Project

Author: Birman Kenneth P.
Marzullo Keith
Publication venue
Publication date
Field of study

The ISIS distributed programming system and the META Project are described. The ISIS programming toolkit is an aid to low-level programming that makes it easy to build fault-tolerant distributed applications that exploit replication and concurrent execution. The META Project is reexamining high-level mechanisms such as the filesystem, shell language, and administration tools in distributed systems

NASA Technical Reports Server

Checkpointing as a Service in Heterogeneous Cloud Environments

Author: Cao Jiajun
Cooperman Gene
Morin Christine
Simonin Matthieu
Publication venue
Publication date: 07/11/2014
Field of study

A non-invasive, cloud-agnostic approach is demonstrated for extending existing cloud platforms to include checkpoint-restart capability. Most cloud platforms currently rely on each application to provide its own fault tolerance. A uniform mechanism within the cloud itself serves two purposes: (a) direct support for long-running jobs, which would otherwise require a custom fault-tolerant mechanism for each application; and (b) the administrative capability to manage an over-subscribed cloud by temporarily swapping out jobs when higher priority jobs arrive. An advantage of this uniform approach is that it also supports parallel and distributed computations, over both TCP and InfiniBand, thus allowing traditional HPC applications to take advantage of an existing cloud infrastructure. Additionally, an integrated health-monitoring mechanism detects when long-running jobs either fail or incur exceptionally low performance, perhaps due to resource starvation, and proactively suspends the job. The cloud-agnostic feature is demonstrated by applying the implementation to two very different cloud platforms: Snooze and OpenStack. The use of a cloud-agnostic architecture also enables, for the first time, migration of applications from one cloud platform to another.Comment: 20 pages, 11 figures, appears in CCGrid, 201

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Group Communication in Amoeba and its Applications

Author: Kaashoek M.F.
Tanenbaum A.S.
Verstoep K.
Publication venue
Publication date: 01/01/1993
Field of study

Unlike many other operating systems, Amoeba is a distributed operating system that provides group communication (i.e., one-to-many communication). We wil

CiteSeerX

VU Research Portal

Group Communication in the Amoeba Distributed Operating System

Author: Kaashoek M.F.
Tanenbaum A.S.
Publication venue
Publication date: 01/01/1991
Field of study

VU Research Portal

ISIS and META projects

Author: Birman Kenneth
Cooper Robert
Marzullo Keith
Publication venue
Publication date
Field of study

The ISIS project has developed a new methodology, virtual synchony, for writing robust distributed software. High performance multicast, large scale applications, and wide area networks are the focus of interest. Several interesting applications that exploit the strengths of ISIS, including an NFS-compatible replicated file system, are being developed. The META project is distributed control in a soft real-time environment incorporating feedback. This domain encompasses examples as diverse as monitoring inventory and consumption on a factory floor, and performing load-balancing on a distributed computing system. One of the first uses of META is for distributed application management: the tasks of configuring a distributed program, dynamically adapting to failures, and monitoring its performance. Recent progress and current plans are reported

NASA Technical Reports Server

The ISIS Project: Real Experience with a Fault Tolerant Programming System

Author: Birman Kenneth
Cooper Robert
Publication venue
Publication date: 01/07/1990
Field of study

The ISIS project has developed a distributed programming toolkit and a collection of higher level applications based on these tools. ISIS is now in use at more than 300 locations world-wise. The lessons (and surprises) gained from this experience with the real world are discussed

NASA Technical Reports Server

eCommons@Cornell

Intelligent architecture for automatic resource allocation in computer clusters

Author: Corsava S.
Corsava S.
Getov Vladimir
Getov Vladimir
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2003
Field of study

As the need for more reporting and assessment of information increase exponentially, computer-based applications consume resources at an alarmingly rapid rate. Therefore, traditional techniques for managing resource allocation, topology and systems need urgent revision. In this paper, we present an intelligent architecture that introduces a new strategy for managing resource discovery, allocation and dynamic reconfiguration at run-time. Our building methodology involves the employment of new types of clustered systems based on large application groupings, each having a master cluster controller. Each controlling engine consists of self-healing intelligent entities that can compensate for a variety of software or hardware problems. We also present evaluation results of extensive experiments in a production environment, which demonstrate the advantages of our approach

Crossref

WestminsterResearch