22,416 research outputs found
Checkpointing as a Service in Heterogeneous Cloud Environments
A non-invasive, cloud-agnostic approach is demonstrated for extending
existing cloud platforms to include checkpoint-restart capability. Most cloud
platforms currently rely on each application to provide its own fault
tolerance. A uniform mechanism within the cloud itself serves two purposes: (a)
direct support for long-running jobs, which would otherwise require a custom
fault-tolerant mechanism for each application; and (b) the administrative
capability to manage an over-subscribed cloud by temporarily swapping out jobs
when higher priority jobs arrive. An advantage of this uniform approach is that
it also supports parallel and distributed computations, over both TCP and
InfiniBand, thus allowing traditional HPC applications to take advantage of an
existing cloud infrastructure. Additionally, an integrated health-monitoring
mechanism detects when long-running jobs either fail or incur exceptionally low
performance, perhaps due to resource starvation, and proactively suspends the
job. The cloud-agnostic feature is demonstrated by applying the implementation
to two very different cloud platforms: Snooze and OpenStack. The use of a
cloud-agnostic architecture also enables, for the first time, migration of
applications from one cloud platform to another.Comment: 20 pages, 11 figures, appears in CCGrid, 201
Recommended from our members
Leveraging simulation practice in industry through use of desktop grid middleware
This chapter focuses on the collaborative use of computing resources to support decision making in industry. Through the use of middleware for desktop grid computing, the idle CPU cycles available on existing computing resources can be harvested and used for speeding-up the execution of applications that have “non-trivial” processing requirements. This chapter focuses on the desktop grid middleware BOINC and Condor, and discusses the integration of commercial simulation software together with free-to-download grid middleware so as to offer competitive advantage to organizations that opt for this technology. It is expected that the low-intervention integration approach presented in this chapter (meaning no changes to source code required) will appeal to both simulation practitioners (as simulations can be executed faster, which in turn would mean that more replications and optimization is possible in the same amount of time) and the management (as it can potentially increase the return on investment on existing resources)
CERN openlab Whitepaper on Future IT Challenges in Scientific Research
This whitepaper describes the major IT challenges in scientific research at CERN and several other European and international research laboratories and projects. Each challenge is exemplified through a set of concrete use cases drawn from the requirements of large-scale scientific programs. The paper is based on contributions from many researchers and IT experts of the participating laboratories and also input from the existing CERN openlab industrial sponsors. The views expressed in this document are those of the individual contributors and do not necessarily reflect the view of their organisations and/or affiliates
Investigating grid computing technologies for use with commercial simulation packages
As simulation experimentation in industry become more computationally demanding, grid computing can be seen as a promising technology that has the potential to bind together the computational resources needed to quickly execute such simulations. To investigate how this might be possible, this paper reviews the grid technologies that can be used together with commercial-off-the-shelf simulation packages (CSPs) used in industry. The paper identifies two specific forms of grid computing (Public Resource Computing and Enterprise-wide Desktop Grid Computing) and the middleware associated with them (BOINC and Condor) as being suitable for grid-enabling existing CSPs. It further proposes three different CSP-grid integration approaches and identifies one of them to be the most appropriate. It is hoped that this research will encourage simulation practitioners to consider grid computing as a technologically viable means of executing CSP-based experiments faster
An Approach to Ad hoc Cloud Computing
We consider how underused computing resources within an enterprise may be
harnessed to improve utilization and create an elastic computing
infrastructure. Most current cloud provision involves a data center model, in
which clusters of machines are dedicated to running cloud infrastructure
software. We propose an additional model, the ad hoc cloud, in which
infrastructure software is distributed over resources harvested from machines
already in existence within an enterprise. In contrast to the data center cloud
model, resource levels are not established a priori, nor are resources
dedicated exclusively to the cloud while in use. A participating machine is not
dedicated to the cloud, but has some other primary purpose such as running
interactive processes for a particular user. We outline the major
implementation challenges and one approach to tackling them
- …