798 research outputs found
Programming your way out of the past: ISIS and the META Project
The ISIS distributed programming system and the META Project are described. The ISIS programming toolkit is an aid to low-level programming that makes it easy to build fault-tolerant distributed applications that exploit replication and concurrent execution. The META Project is reexamining high-level mechanisms such as the filesystem, shell language, and administration tools in distributed systems
Checkpointing as a Service in Heterogeneous Cloud Environments
A non-invasive, cloud-agnostic approach is demonstrated for extending
existing cloud platforms to include checkpoint-restart capability. Most cloud
platforms currently rely on each application to provide its own fault
tolerance. A uniform mechanism within the cloud itself serves two purposes: (a)
direct support for long-running jobs, which would otherwise require a custom
fault-tolerant mechanism for each application; and (b) the administrative
capability to manage an over-subscribed cloud by temporarily swapping out jobs
when higher priority jobs arrive. An advantage of this uniform approach is that
it also supports parallel and distributed computations, over both TCP and
InfiniBand, thus allowing traditional HPC applications to take advantage of an
existing cloud infrastructure. Additionally, an integrated health-monitoring
mechanism detects when long-running jobs either fail or incur exceptionally low
performance, perhaps due to resource starvation, and proactively suspends the
job. The cloud-agnostic feature is demonstrated by applying the implementation
to two very different cloud platforms: Snooze and OpenStack. The use of a
cloud-agnostic architecture also enables, for the first time, migration of
applications from one cloud platform to another.Comment: 20 pages, 11 figures, appears in CCGrid, 201
Group Communication in Amoeba and its Applications
Unlike many other operating systems, Amoeba is a distributed operating system that provides group communication (i.e., one-to-many communication). We wil
ISIS and META projects
The ISIS project has developed a new methodology, virtual synchony, for writing robust distributed software. High performance multicast, large scale applications, and wide area networks are the focus of interest. Several interesting applications that exploit the strengths of ISIS, including an NFS-compatible replicated file system, are being developed. The META project is distributed control in a soft real-time environment incorporating feedback. This domain encompasses examples as diverse as monitoring inventory and consumption on a factory floor, and performing load-balancing on a distributed computing system. One of the first uses of META is for distributed application management: the tasks of configuring a distributed program, dynamically adapting to failures, and monitoring its performance. Recent progress and current plans are reported
The ISIS Project: Real Experience with a Fault Tolerant Programming System
The ISIS project has developed a distributed programming toolkit and a collection of higher level applications based on these tools. ISIS is now in use at more than 300 locations world-wise. The lessons (and surprises) gained from this experience with the real world are discussed
Intelligent architecture for automatic resource allocation in computer clusters
As the need for more reporting and assessment of information increase exponentially, computer-based applications consume resources at an alarmingly rapid rate. Therefore, traditional techniques for managing resource allocation, topology and systems need urgent revision. In this paper, we present an intelligent architecture that introduces a new strategy for managing resource discovery, allocation and dynamic reconfiguration at run-time. Our building methodology involves the employment of new types of clustered systems based on large application groupings, each having a master cluster controller. Each controlling engine consists of self-healing intelligent entities that can compensate for a variety of software or hardware
problems. We also present evaluation results of extensive
experiments in a production environment, which demonstrate the advantages of our approach
- …