11 research outputs found
A Taxonomy of Workflow Management Systems for Grid Computing
With the advent of Grid and application technologies, scientists and
engineers are building more and more complex applications to manage and process
large data sets, and execute scientific experiments on distributed resources.
Such application scenarios require means for composing and executing complex
workflows. Therefore, many efforts have been made towards the development of
workflow management systems for Grid computing. In this paper, we propose a
taxonomy that characterizes and classifies various approaches for building and
executing workflows on Grids. We also survey several representative Grid
workflow systems developed by various projects world-wide to demonstrate the
comprehensiveness of the taxonomy. The taxonomy not only highlights the design
and engineering similarities and differences of state-of-the-art in Grid
workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure
Contributions to Desktop Grid Computing : From High Throughput Computing to Data-Intensive Sciences on Hybrid Distributed Computing Infrastructures
Since the mid 90’s, Desktop Grid Computing - i.e the idea of using a large number of remote PCs distributed on the Internet to execute large parallel applications - has proved to be an efficient paradigm to provide a large computational power at the fraction of the cost of a dedicated computing infrastructure.This document presents my contributions over the last decade to broaden the scope of Desktop Grid Computing. My research has followed three different directions. The first direction has established new methods to observe and characterize Desktop Grid resources and developed experimental platforms to test and validate our approach in conditions close to reality. The second line of research has focused on integrating Desk- top Grids in e-science Grid infrastructure (e.g. EGI), which requires to address many challenges such as security, scheduling, quality of service, and more. The third direction has investigated how to support large-scale data management and data intensive applica- tions on such infrastructures, including support for the new and emerging data-oriented programming models.This manuscript not only reports on the scientific achievements and the technologies developed to support our objectives, but also on the international collaborations and projects I have been involved in, as well as the scientific mentoring which motivates my candidature for the Habilitation `a Diriger les Recherches
Virtual Machine Image Management for Elastic Resource Usage in Grid Computing
Grid Computing has evolved from an academic concept to a powerful paradigm in the area of high performance computing (HPC). Over the last few years, powerful Grid computing solutions were developed that allow the execution of computational tasks on distributed computing resources. Grid computing has recently attracted many commercial customers. To enable commercial customers to be able to execute sensitive data in the Grid, strong security mechanisms must be put in place to secure the customers' data.
In contrast, the development of Cloud Computing, which entered the scene in 2006, was driven by industry: it was designed with respect to security from the beginning. Virtualization technology is used to separate the users e.g., by putting the different users of a system inside a virtual machine, which prevents them from accessing other users' data.
The use of virtualization in the context of Grid computing has been examined early and was found to be a promising approach to counter the security threats that have appeared with commercial customers.
One main part of the work presented in this thesis is the Image Creation Station (ICS), a component which allows users to administer their virtual execution environments (virtual machines) themselves and which is responsible for managing and distributing the virtual machines in the entire system.
In contrast to Cloud computing, which was designed to allow even inexperienced users to execute their computational tasks in the Cloud easily, Grid computing is much more complex to use. The ICS makes it easier to use the Grid by overcoming traditional limitations like installing needed software on the compute nodes that users use to execute the computational tasks. This allows users to bring commercial software to the Grid for the first time, without the need for local administrators to install the software to computing nodes that are accessible by all users. Moreover, the administrative burden is shifted from the local Grid site's administrator to the users or experienced software providers that allow the provision of individually tailored virtual machines to each user. But the ICS is not only responsible for enabling users to manage their virtual machines themselves, it also ensures that the virtual machines are available on every site that is part of the distributed Grid system.
A second aspect of the presented solution focuses on the elasticity of the system by automatically acquiring free external resources depending on the system's current workload. In contrast to existing systems, the presented approach allows the system's administrator to add or remove resource sets during runtime without needing to restart the entire system. Moreover, the presented solution allows users to not only use existing Grid resources but allows them to scale out to Cloud resources and use these resources on-demand. By ensuring that unused resources are shut down as soon as possible, the computational costs of a given task are minimized. In addition, the presented solution allows each user to specify which resources can be used to execute a particular job. This is useful when a job processes sensitive data e.g., that is not allowed to leave the company. To obtain a comparable function in today's systems, a user must submit her computational task to a particular resource set, losing the ability to automatically schedule if more than one set of resources can be used.
In addition, the proposed solution prioritizes each set of resources by taking different metrics into account (e.g. the level of trust or computational costs) and tries to schedule the job to resources with the highest priority first. It is notable that the priority often mimics the physical distance from the resources to the user: a locally available Cluster usually has a higher priority due to the high level of trust and the computational costs, that are usually lower than the costs of using Cloud resources. Therefore, this scheduling strategy minimizes the costs of job execution by improving security at the same time since data is not necessarily transferred to remote resources and the probability of attacks by malicious external users is minimized.
Bringing both components together results in a system that adapts automatically to the current workload by using external (e.g., Cloud) resources together with existing locally available resources or Grid sites and provides individually tailored virtual execution environments to the system's users
Advancing Operating Systems via Aspect-Oriented Programming
Operating system kernels are among the most complex pieces of software in existence to-
day. Maintaining the kernel code and developing new functionality is increasingly compli-
cated, since the amount of required features has risen significantly, leading to side ef fects
that can be introduced inadvertedly by changing a piece of code that belongs to a completely
dif ferent context.
Software developers try to modularize their code base into separate functional units.
Some of the functionality or “concerns” required in a kernel, however, does not fit into
the given modularization structure; this code may then be spread over the code base and
its implementation tangled with code implementing dif ferent concerns. These so-called
“crosscutting concerns” are especially dif ficult to handle since a change in a crosscutting
concern implies that all relevant locations spread throughout the code base have to be
modified.
Aspect-Oriented Software Development (AOSD) is an approach to handle crosscutting
concerns by factoring them out into separate modules. The “advice” code contained in
these modules is woven into the original code base according to a pointcut description, a
set of interaction points (joinpoints) with the code base.
To be used in operating systems, AOSD requires tool support for the prevalent procedu-
ral programming style as well as support for weaving aspects. Many interactions in kernel
code are dynamic, so in order to implement non-static behavior and improve performance,
a dynamic weaver that deploys and undeploys aspects at system runtime is required.
This thesis presents an extension of the “C” programming language to support AOSD.
Based on this, two dynamic weaving toolkits – TOSKANA and TOSKANA-VM – are presented
to permit dynamic aspect weaving in the monolithic NetBSD kernel as well as in a virtual-
machine and microkernel-based Linux kernel running on top of L4. Based on TOSKANA,
applications for this dynamic aspect technology are discussed and evaluated.
The thesis closes with a view on an aspect-oriented kernel structure that maintains
coherency and handles crosscutting concerns using dynamic aspects while enhancing de-
velopment methods through the use of domain-specific programming languages
GRID superscalar: a programming model for the Grid
Durant els darrers anys el Grid ha sorgit com una nova plataforma per la computaciĂł distribuĂŻda. La tecnologia Gris permet unir diferents recursos de diferents dominis administratius i formar un superordinador virtual amb tots ells. Molts grups de recerca han dedicat els seus esforços a desenvolupar un conjunt de serveis bĂ sics per oferir un middleware de Grid: una capa que permet l'Ăşs del Grid. De tota manera, utilitzar aquests serveis no Ă©s una tasca fácil per molts usuaris finals, cosa que empitjora si l'expertesa d'aquests usuaris no estĂ relacionada amb la informĂ tica.Això tĂ© una influència negativa a l'hora de que la comunitat cientĂfica adopti la tecnologia Grid. Es veu com una tecnologia potent però molt difĂcil de fer servir. Per facilitar l'Ăşs del Grid Ă©s necessĂ ria una capa extra que amagui la complexitat d'aquest i permeti als usuaris programar o portar les seves aplicacions de manera senzilla.Existeixen moltes propostes d'eines de programaciĂł pel Grid. En aquesta tesi fem un resum d'algunes d'elles, i podem veure que existeixen eines conscients i no-conscients del Grid (es programen especificant o no els detalls del Grid, respectivament). A mĂ©s, molt poques d'aquestes eines poden explotar el paral·lelisme implĂcit de l'aplicaciĂł, i en la majoria d'elles, l'usuari ha de definir aquest paral·lelisme de manera explĂcita. Una altra caracterĂstica que considerem important Ă©s si es basen en llenguatges de programaciĂł molt populars (com C++ o Java), cosa que facilita l'adopciĂł per part dels usuaris finals.En aquesta tesi, el nostre objectiu principal ha estat crear un model de programaciĂł pel Grid basat en la programaciĂł seqĂĽencial i els llenguatges mĂ©s coneguts de la programaciĂł imperativa, capaç d'explotar el paral·lelisme implĂcit de les aplicacions i d'accelerar-les fent servir els recursos del Grid de manera concurrent. A mĂ©s, com el Grid Ă©s de naturalesa distribuĂŻda, heterogènia i dinĂ mica i degut tambĂ© a que el nombre de recursos que pot formar un Grid pot ser molt gran, la probabilitat de que es produeixi una errada durant l'execuciĂł d'una aplicaciĂł Ă©s elevada. Per tant, un altre dels nostres objectius ha estat tractar qualsevol tipus d'error que pugui sorgir durant l'execuciĂł d'una aplicaciĂł de manera automĂ tica (ja siguin errors relacionats amb l'aplicaciĂł o amb el Grid). GRID superscalar (GRIDSs), la principal contribuciĂł d'aquesta tesi, Ă©s un model de programaciĂł que assoleix elsobjectius mencionats proporcionant una interfĂcie molt petita i simple i un entorn d'execuciĂł que Ă©s capaç d'executar en paral·lel el codi proporcionat fent servir el Grid. La nostra interfĂcie de programaciĂł permet a un usuari programar una aplicaciĂł no-conscient del Grid, amb llenguatges imperatius coneguts i populars (com C/C++, Java, Perl o Shell script) i de manera seqĂĽencial, per tant dĂłna un pas important per ajudar als usuaris a adoptar la tecnologia Grid.Hem aplicat el nostre coneixement de l'arquitectura de computadors i el disseny de microprocessadors a l'entorn d'execuciĂł de GRIDSs. Tal com es fa a un processador superescalar, l'entorn d'execuciĂł de GRIDSs Ă©s capaç de realitzar un anĂ lisi de dependències entre les tasques que formen l'aplicaciĂł, i d'aplicar tècniques de renombrament per incrementar el seu paral·lelisme. GRIDSs genera automĂ ticament a partir del codi principal de l'usuari un graf que descriu les dependències de dades en l'aplicaciĂł. TambĂ© presentem casos d'Ăşs reals del model de programaciĂł en els camps de la quĂmica computacional i la bioinformĂ tica, que demostren que els nostres objectius han estat assolits.Finalment, hem estudiat l'aplicaciĂł de diferents tècniques per detectar i tractar fallades: checkpoint, reintent i replicaciĂł de tasques. La nostra proposta Ă©s proporcionar un entorn capaç de tractar qualsevol tipus d'errors, de manera transparent a l'usuari sempre que sigui possible. El principal avantatge d'implementar aquests mecanismos al nivell del model de programaciĂł Ă©s que el coneixement a nivell de l'aplicaciĂł pot ser explotat per crear dinĂ micament una estratègia de tolerĂ ncia a fallades per cada aplicaciĂł, i evitar introduir sobrecĂ rrega en entorns lliures d'errors.During last years, the Grid has emerged as a new platform for distributed computing. The Grid technology allows joining different resources from different administrative domains and forming a virtual supercomputer with all of them.Many research groups have dedicated their efforts to develop a set of basic services to offer a Grid middleware: a layer that enables the use of the Grid. Anyway, using these services is not an easy task for many end users, even more if their expertise is not related to computer science. This has a negative influence in the adoption of the Grid technology by the scientific community. They see it as a powerful technology but very difficult to exploit. In order to ease the way the Grid must be used, there is a need for an extra layer which hides all the complexity of the Grid, and allows users to program or port their applications in an easy way.There has been many proposals of programming tools for the Grid. In this thesis we give an overview on some of them, and we can see that there exist both Grid-aware and Grid-unaware environments (programmed with or without specifying details of the Grid respectively). Besides, very few existing tools can exploit the implicit parallelism of the application and in the majority of them, the user must define the parallelism explicitly. Another important feature we consider is if they are based in widely used programming languages (as C++ or Java), so the adoption is easier for end users.In this thesis, our main objective has been to create a programming model for the Grid based on sequential programming and well-known imperative programming languages, able to exploit the implicit parallelism of applications and to speed them up by using the Grid resources concurrently. Moreover, because the Grid has a distributed, heterogeneous and dynamic nature and also because the number of resources that form a Grid can be very big, the probability that an error arises during an application's execution is big. Thus, another of our objectives has been to automatically deal with any type of errors which may arise during the execution of the application (application related or Grid related).GRID superscalar (GRIDSs), the main contribution of this thesis, is a programming model that achieves these mentioned objectives by providing a very small and simple interface and a runtime that is able to execute in parallel the code provided using the Grid. Our programming interface allows a user to program a Grid-unaware application with already known and popular imperative languages (such as C/C++, Java, Perl or Shell script) and in a sequential fashion, therefore giving an important step to assist end users in the adoption of the Grid technology.We have applied our knowledge from computer architecture and microprocessor design to the GRIDSs runtime. As it is done in a superscalar processor, the GRIDSs runtime system is able to perform a data dependence analysis between the tasks that form an application, and to apply renaming techniques in order to increase its parallelism. GRIDSs generates automatically from user's main code a graph describing the data dependencies in the application.We present real use cases of the programming model in the fields of computational chemistry and bioinformatics, which demonstrate that our objectives have been achieved.Finally, we have studied the application of several fault detection and treatment techniques: checkpointing, task retry and task replication. Our proposal is to provide an environment able to deal with all types of failures, transparently for the user whenever possible. The main advantage in implementing these mechanisms at the programming model level is that application-level knowledge can be exploited in order to dynamically create a fault tolerance strategy for each application, and avoiding to introduce overhead in error-free environments
JOLTS : checkpointing and coordination in grid systems
The need for increased computational power is growing faster than our ability to produce faster computers. Already researchers are proposing systems that require peta-flop capable super computers, a far cry from what is currently capable. To meet such high computational requirements, networks of computers will be required. While it is possible to network together computers to achieve a single task, making that network more flexible to handle a multitude of different tasks is the promise of grid computing.
Grid systems are slowly appearing that are designed to run many independent tasks, and provide the ability for programs to migrate between machines before completion. However, these systems lack coordination capabilities. Many grid systems/environments allow multiple tasks to communicate/coordinate with each other based on various paradigms, but don't provide migration capabilities.
This thesis proposes a system, called JOLTS, that attempts to fill a gap by providing both checkpointing and coordination capabilities. The coordination model offered by JOLTS is based on the Objective Linda coordination language, with some additions. This thesis will show that the object space model is an effective form of coordination and communication, and can effectively be combined with checkpointing capabilities inside the same grid system
Department of Computer Science Activity 1998-2004
This report summarizes much of the research and teaching activity of the Department of Computer Science at Dartmouth College between late 1998 and late 2004. The material for this report was collected as part of the final report for NSF Institutional Infrastructure award EIA-9802068, which funded equipment and technical staff during that six-year period. This equipment and staff supported essentially all of the department\u27s research activity during that period
On the construction of decentralised service-oriented orchestration systems
Modern science relies on workflow technology to capture, process, and analyse data obtained from scientific instruments. Scientific workflows are precise descriptions of experiments in which multiple computational tasks are coordinated based on the dataflows between them. Orchestrating scientific workflows presents a significant research challenge: they are typically executed in a manner such that all data pass through a centralised computer server known as the engine, which causes unnecessary network traffic that leads to a performance bottleneck. These workflows are commonly composed of services that perform computation over geographically distributed resources, and involve the management of dataflows between them. Centralised orchestration is clearly not a scalable approach for coordinating services dispersed across distant geographical locations. This thesis presents a scalable decentralised service-oriented orchestration system that relies on a high-level data coordination language for the specification and execution of workflows. This system’s architecture consists of distributed engines, each of which is responsible for executing part of the overall workflow. It exploits parallelism in the workflow by decomposing it into smaller sub-workflows, and determines the most appropriate engines to execute them using computation placement analysis. This permits the workflow logic to be distributed closer to the services providing the data for execution, which reduces the overall data transfer in the workflow and improves its execution time. This thesis provides an evaluation of the presented system which concludes that decentralised orchestration provides scalability benefits over centralised orchestration, and improves the overall performance of executing a service-oriented workflow