Search CORE

440 research outputs found

Checkpointing as a Service in Heterogeneous Cloud Environments

Author: Cao Jiajun
Cooperman Gene
Morin Christine
Simonin Matthieu
Publication venue
Publication date: 07/11/2014
Field of study

A non-invasive, cloud-agnostic approach is demonstrated for extending existing cloud platforms to include checkpoint-restart capability. Most cloud platforms currently rely on each application to provide its own fault tolerance. A uniform mechanism within the cloud itself serves two purposes: (a) direct support for long-running jobs, which would otherwise require a custom fault-tolerant mechanism for each application; and (b) the administrative capability to manage an over-subscribed cloud by temporarily swapping out jobs when higher priority jobs arrive. An advantage of this uniform approach is that it also supports parallel and distributed computations, over both TCP and InfiniBand, thus allowing traditional HPC applications to take advantage of an existing cloud infrastructure. Additionally, an integrated health-monitoring mechanism detects when long-running jobs either fail or incur exceptionally low performance, perhaps due to resource starvation, and proactively suspends the job. The cloud-agnostic feature is demonstrated by applying the implementation to two very different cloud platforms: Snooze and OpenStack. The use of a cloud-agnostic architecture also enables, for the first time, migration of applications from one cloud platform to another.Comment: 20 pages, 11 figures, appears in CCGrid, 201

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Adaptive scheduling solution for grid meta-brokering

Author: Dombi József
Dombi József Dániel
Kertész Attila
Publication venue
Publication date: 01/01/2008
Field of study

The nearly optimal, interoperable utilization of various grid resources play an important role in the world of grids. Though well-designed, evaluated and widely used resource brokers have been developed, these existing solutions still cannot cope with the high uncertainty ruling current grid systems. To ease the simultaneous utilization of different middleware systems, researchers need to revise current solutions. In this paper we propose advanced scheduling techniques with a weighted fitness function for an adaptive Meta-Brokering Grid Service, which enables a higher level utilization of the existing grid brokers. We also set up a grid simulation environment to demonstrate the efficiency of the proposed meta-level scheduling solution. The presented evaluation results show that the proposed novel scheduling technique in the meta-brokering context delivers better performance

University of Szeged

Developing a distributed electronic health-record store for India

Author: Dowling Jim
Publication venue
Publication date: 01/01/2008
Field of study

The DIGHT project is addressing the problem of building a scalable and highly available information store for the Electronic Health Records (EHRs) of the over one billion citizens of India

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Fault-Tolerant Dynamic Deduplication for Utility Computing

Author: Garraghan P
Leesakul W
Townend P
Xu J
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Utility computing is an increasingly important paradigm, whereby computing resources are provided on-demand as utilities. An important component of utility computing is storage, data volumes are growing rapidly, and mechanisms to mitigate this growth need to be developed. Data deduplication is a promising technique for drastically reducing the amount of data stored in such system systems, however, current approachs are static in nature, using an amount of redundancy fixed at design time. This is inappropriate for truly dynamic modern systems. We propose a real-time adaptive deduplication system for Cloud and Utility computing that monitors in real-time for changing system, user, and environmental behaviour in order to fulfill a balance between changing storage efficiency, performance, and fault tolerance requirements. We evaluate our system through simulation, with experimental results showing that our system is both efficient and sclable. We also perform experimentation to evaluate the fault tolerance of the system by measuring Mean Time to Repair (MTTR), and using these values to calculate availability of the system. The results show that higher replication levels result in higher system availability, however, the number of files in the system also effects recovery time. We show that the tradeoff between replication levels and recovery time when the system overloads needs further investigation

Lancaster E-Prints

White Rose Research Online

Recommended from our members

Combining Mobile Agents and Process-based Coordination to Achieve Software Adaptation

Author: Kaiser Gail E.
Valetto Giuseppe
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2002
Field of study

We have developed a model and a platform for end-to-end run-time monitoring, behavior and performance analysis, and consequent dynamic adaptation of distributed applications. This paper concentrates on how we coordinate and actuate the potentially multi-part adaptation, operating externally to the target systems, that is, without requiring any a priori built-in adaptation facilities on the part of said target systems. The actual changes are performed on the fly onto the target by communities of mobile software agents, coordinated by a decentralized process engine. These changes can be coarse-grained, such as replacing entire components or rearranging the connections among components, or fine-grained, such as changing the operational parameters, internal state and functioning logic of individual components. We discuss our successful experience using our approach in dynamic adaptation of a large-scale commercial application, which requires both coarse and fine grained modifications

Columbia University Academic Commons

Grid-enabling FIRST: Speeding up simulation applications using WinGrid

Author: Anders Alstad
Bjorn Larsen
John Ladbrook
Navonil Mustafee
Simon J E Taylor
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

The vision of grid computing is to make computational power, storage capacity, data and applications available to users as readily as electricity and other utilities. Grid infrastructures and applications have traditionally been geared towards dedicated, centralized, high performance clusters running on UNIX flavour operating systems (commonly referred to as cluster-based grid computing). This can be contrasted with desktop-based grid computing which refers to the aggregation of non-dedicated, de-centralized, commodity PCs connected through a network and running (mostly) the Microsoft Windowstrade operating system. Large scale adoption of such Windowstrade-based grid infrastructure may be facilitated via grid-enabling existing Windows applications. This paper presents the WinGridtrade approach to grid enabling existing Windowstrade based commercial-off-the-shelf (COTS) simulation packages (CSPs). Through the use of a case study developed in conjunction with Ford Motor Company, the paper demonstrates how experimentation with the CSP Witnesstrade and FIRST can achieve a linear speedup when WinGridtrade is used to harness idle PC computing resources. This, combined with the lessons learned from the case study, has encouraged us to develop the Web service extensions to WinGridtrade. It is hoped that this would facilitate wider acceptance of WinGridtrade among enterprises having stringent security policies in place

CiteSeerX

Brunel University Research Archive

Towards low cost prototyping of mobile opportunistic disconnection tolerant networks and systems

Author: Crowcroft Jonathan
Radenkovic Milena
Rehmani Mubashir Husain
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Fast emerging mobile edge computing, mobile clouds, Internet of Things (IoT) and cyber physical systems require many novel realistic real time multi-layer algorithms for a wide range of domains, such as intelligent content provision and processing, smart transport, smart manufacturing systems and mobile end user applications. This paper proposes a low cost open source platform, MODiToNeS, which uses commodity hardware to support prototyping and testing of fully distributed multi-layer complex algorithms over real world (or pseudo real) traces. MODiToNeS platform is generic and comprises multiple interfaces that allow real time topology and mobility control, deployment and analysis of different self-organised and self-adaptive routing algorithms, real time content processing, and real time environment sensing with predictive analytics. Our platform also allows rich interactivity with the user. We show deployment and analysis of two vastly different complex networking systems: fault and disconnection aware smart manufacturing sensor network and cognitive privacy for personal clouds. We show that our platform design can integrate both contexts transparently and organically and allows a wide range of analysis

Nottingham ePrints

Nottingham eTheses

Crossref

Repository@Nottingham

Enhancing reliability with Latin Square redundancy on desktop grids.

Author: Johnson Nathan Patrick
Publication venue: ThinkIR: The University of Louisville\u27s Institutional Repository
Publication date: 01/05/2010
Field of study

Computational grids are some of the largest computer systems in existence today. Unfortunately they are also, in many cases, the least reliable. This research examines the use of redundancy with permutation as a method of improving reliability in computational grid applications. Three primary avenues are explored - development of a new redundancy model, the Replication and Permutation Paradigm (RPP) for computational grids, development of grid simulation software for testing RPP against other redundancy methods and, finally, running a program on a live grid using RPP. An important part of RPP involves distributing data and tasks across the grid in Latin Square fashion. Two theorems and subsequent proofs regarding Latin Squares are developed. The theorems describe the changing position of symbols between the rows of a standard Latin Square. When a symbol is missing because a column is removed the theorems provide a basis for determining the next row and column where the missing symbol can be found. Interesting in their own right, the theorems have implications for redundancy. In terms of the redundancy model, the theorems allow one to state the maximum makespan in the face of missing computational hosts when using Latin Square redundancy. The simulator software was developed and used to compare different data and task distribution schemes on a simulated grid. The software clearly showed the advantage of running RPP, which resulted in faster completion times in the face of computational host failures. The Latin Square method also fails gracefully in that jobs complete with massive node failure while increasing makespan. Finally an Inductive Logic Program (ILP) for pharmacophore search was executed, using a Latin Square redundancy methodology, on a Condor grid in the Dahlem Lab at the University of Louisville Speed School of Engineering. All jobs completed, even in the face of large numbers of randomly generated computational host failures

University of Louisville