1,075 research outputs found
Checkpointing as a Service in Heterogeneous Cloud Environments
A non-invasive, cloud-agnostic approach is demonstrated for extending
existing cloud platforms to include checkpoint-restart capability. Most cloud
platforms currently rely on each application to provide its own fault
tolerance. A uniform mechanism within the cloud itself serves two purposes: (a)
direct support for long-running jobs, which would otherwise require a custom
fault-tolerant mechanism for each application; and (b) the administrative
capability to manage an over-subscribed cloud by temporarily swapping out jobs
when higher priority jobs arrive. An advantage of this uniform approach is that
it also supports parallel and distributed computations, over both TCP and
InfiniBand, thus allowing traditional HPC applications to take advantage of an
existing cloud infrastructure. Additionally, an integrated health-monitoring
mechanism detects when long-running jobs either fail or incur exceptionally low
performance, perhaps due to resource starvation, and proactively suspends the
job. The cloud-agnostic feature is demonstrated by applying the implementation
to two very different cloud platforms: Snooze and OpenStack. The use of a
cloud-agnostic architecture also enables, for the first time, migration of
applications from one cloud platform to another.Comment: 20 pages, 11 figures, appears in CCGrid, 201
Containers : A Sound Basis For a True Single System Image
Clusters of SMPs are attractive for executing shared memory parallel applications but reconciling high performance and ease of programming remains an open issue. A possible approach is to provide an efficient Single System Image (SSI) operating system giving the illusion of an SMP machine. In this paper, we introduce the concept of container as a mechanism to unify global resource management at the lowest operating system level. Higher level operating system services such as virtual memory system and file cache can be easily implemented based on containers and transparently take benefit of the whole memory resource available in the cluster
Saline: Improving Best-Effort Job Management in Grids
Although virtualization technologies have recently gained a lot of interest in Grid computing as they allow flexible resource management, the most common way to exploit grids still relies on dedicated services like resource management systems (RMSs) to get resources at a particular time. To improve resource usage, most of these systems provide a best-effort mode where lowest priority jobs can be executed when resources are idle. This particular mode does not provide any guarantee of service and jobs may be killed at any time by the RMS when the nodes they use are subject to higher priority reservations. This behaviour potentially leads to a huge waste of computation time or at least requires users to deal with checkpoints of their best-effort jobs. In this paper, we present Saline, a generic and non-intrusive framework to manage best-effort jobs at grid level through virtual machines (VMs) usage. We discuss the main challenges concerning the design of such a grid system, focusing on VM snapshot management and network configuration. Results of preliminary experiments show the interest of our proposal to ensure an efficient execution of best-effort jobs through the whole grid
Incremental-LDI for Multi-View Coding
International audienceThis paper describes an Incremental algorithm for Layer Depth Image construction (I-LDI) from multi-view plus depth data sets. A solution to sampling artifacts is proposed, based on pixel interpolation (inpainting) restricted to isolated unknown pixels. A solution to ghosting artifacts is also proposed, based on a depth discontinuity detection, followed by a local foreground / background classification. We propose a formulation of warping equations which reduces time consumption, specifically for LDI warping. Tests on Breakdancers and Ballet MVD data sets show that extra layers in I-LDI contain only 10% of first layer pixels, compared to 50% for LDI. I-LDI Layers are also more compact, with a less spread pixel distribution, and thus easier to compress than LDI Visual rendering is of similar quality with I-LDI and LDI
Implementing atomic rendezvous within a transactional framework
International audienceThe authors address the problem of implementing the CSP (communicating sequential processes) rendezvous within a transactional framework. Instead of implementing a fair nondeterministic choice and assuming the correct functioning of processors and communication media, the authors propose an efficient transactional implementation of the atomic rendezvous in the presence of processor failures in a multiprocessor machine. Both atomicity and efficiency are obtained by using high-speed stable storage device
- …