Search CORE

8 research outputs found

Recommended from our members

Survivor: An Approach for Adding Dependability to Legacy Workflow Systems

Author: Greze Jean-Denis
Kaiser Gail E.
Kc Gaurav S.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2002
Field of study

Although they often provide critical services, most workflow systems are not dependable. There has been much literature on dependable/survivable distributed systems; most is concerned with developing new architectures, not adapting pre-existing ones. Additionally, the literature is focused on hardening, security-based defense, as opposed to recovery. For deployed systems, it is often infeasible to completely replace existing infrastructures; what is more pragmatic are ways in which existing distributed systems can be adapted to offer better dependability. In this paper, we outline a general architecture that can easily be retrofitted to legacy workflow systems in order to improve dependability and fault tolerance. We do this by monitoring enactment and replicating partial workflow states as tools for detection, analysis and recovery. We discuss some policies that can guide these mechanisms. Finally, we describe and evaluate our implementation, Survivor, which modified an existing workflow system provided by the Naval Research Lab

Columbia University Academic Commons

Flexible multi-layer virtual machine design for virtual laboratory in distributed systems and grids.

Author: Kim Dohan
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2005
Field of study

We propose a flexible Multi-layer Virtual Machine (MVM) design intended to improve efficiencies in distributed and grid computing and to overcome the known current problems that exist within traditional virtual machine architectures and those used in distributed and grid systems. This thesis presents a novel approach to building a virtual laboratory to support e-science by adapting MVMs within the distributed systems and grids, thereby providing enhanced flexibility and reconfigurability by raising the level of abstraction. The MVM consists of three layers. They are OS-level VM, queue VMs, and components VMs. The group of MVMs provides the virtualized resources, virtualized networks, and reconfigurable components layer for virtual laboratories. We demonstrate how our reconfigurable virtual machine can allow software designers and developers to reuse parallel communication patterns. In our framework, the virtual machines can be created on-demand and their applications can be distributed at the source-code level, compiled and instantiated in runtime. (Abstract shortened by UMI.) Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2005 .K56. Source: Masters Abstracts International, Volume: 44-03, page: 1405. Thesis (M.Sc.)--University of Windsor (Canada), 2005

Scholarship at UWindsor

不確実なネットワーク条件におけるロバストな経路選択方式

Author: Ranaweera Ravindra Sandaruwan
Publication venue
Publication date: 29/05/2019
Field of study

電気通信大学201

Creative Repository of Electro-Communications

Support for flexible and transparent distributed computing

Author: Liu H.
Publication venue: UCL (University College London)
Publication date: 28/04/2010
Field of study

Modern distributed computing developed from the traditional supercomputing community rooted firmly in the culture of batch management. Therefore, the field has been dominated by queuing-based resource managers and work flow based job submission environments where static resource demands needed be determined and reserved prior to launching executions. This has made it difficult to support resource environments (e.g. Grid, Cloud) where the available resources as well as the resource requirements of applications may be both dynamic and unpredictable. This thesis introduces a flexible execution model where the compute capacity can be adapted to fit the needs of applications as they change during execution. Resource provision in this model is based on a fine-grained, self-service approach instead of the traditional one-time, system-level model. The thesis introduces a middleware based Application Agent (AA) that provides a platform for the applications to dynamically interact and negotiate resources with the underlying resource infrastructure. We also consider the issue of transparency, i.e., hiding the provision and management of the distributed environment. This is the key to attracting public to use the technology. The AA not only replaces user-controlled process of preparing and executing an application with a transparent software-controlled process, it also hides the complexity of selecting right resources to ensure execution QoS. This service is provided by an On-line Feedback-based Automatic Resource Configuration (OAC) mechanism cooperating with the flexible execution model. The AA constantly monitors utility-based feedbacks from the application during execution and thus is able to learn its behaviour and resource characteristics. This allows it to automatically compose the most efficient execution environment on the fly and satisfy any execution requirements defined by users. Two policies are introduced to supervise the information learning and resource tuning in the OAC. The Utility Classification policy classifies hosts according to their historical performance contributions to the application. According to this classification, the AA chooses high utility hosts and withdraws low utility hosts to configure an optimum environment. The Desired Processing Power Estimation (DPPE) policy dynamically configures the execution environment according to the estimated desired total processing power needed to satisfy users’ execution requirements. Through the introducing of flexibility and transparency, a user is able to run a dynamic/normal distributed application anywhere with optimised execution performance, without managing distributed resources. Based on the standalone model, the thesis further introduces a federated resource negotiation framework as a step forward towards an autonomous multi-user distributed computing world

UCL Discovery

Autonomous grid scheduling using probabilistic job runtime scheduling

Author: Lazarević A.
Publication venue: 'Queen Mary University of London'
Publication date: 01/01/2008
Field of study

Computational Grids are evolving into a global, service-oriented architecture – a universal platform for delivering future computational services to a range of applications of varying complexity and resource requirements. The thesis focuses on developing a new scheduling model for general-purpose, utility clusters based on the concept of user requested job completion deadlines. In such a system, a user would be able to request each job to finish by a certain deadline, and possibly to a certain monetary cost. Implementing deadline scheduling is dependent on the ability to predict the execution time of each queued job, and on an adaptive scheduling algorithm able to use those predictions to maximise deadline adherence. The thesis proposes novel solutions to these two problems and documents their implementation in a largely autonomous and self-managing way. The starting point of the work is an extensive analysis of a representative Grid workload revealing consistent workflow patterns, usage cycles and correlations between the execution times of jobs and its properties commonly collected by the Grid middleware for accounting purposes. An automated approach is proposed to identify these dependencies and use them to partition the highly variable workload into subsets of more consistent and predictable behaviour. A range of time-series forecasting models, applied in this context for the first time, were used to model the job execution times as a function of their historical behaviour and associated properties. Based on the resulting predictions of job runtimes a novel scheduling algorithm is able to estimate the latest job start time necessary to meet the requested deadline and sort the queue accordingly to minimise the amount of deadline overrun. The testing of the proposed approach was done using the actual job trace collected from a production Grid facility. The best performing execution time predictor (the auto-regressive moving average method) coupled to workload partitioning based on three simultaneous job properties returned the median absolute percentage error centroid of only 4.75%. This level of prediction accuracy enabled the proposed deadline scheduling method to reduce the average deadline overrun time ten-fold compared to the benchmark batch scheduler. Overall, the thesis demonstrates that deadline scheduling of computational jobs on the Grid is achievable using statistical forecasting of job execution times based on historical information. The proposed approach is easily implementable, substantially self-managing and better matched to the human workflow making it well suited for implementation in the utility Grids of the future

UCL Discovery

OpenGrey Repository

Autonomous grid scheduling using probabilistic job runtime forecasting.

Author: Lazarevic A.
Publication venue: 'Queen Mary University of London'
Publication date: 01/01/2008
Field of study

Computational Grids are evolving into a global, service-oriented architecture a universal platform for delivering future computational services to a range of applications of varying complexity and resource requirements. The thesis focuses on developing a new scheduling model for general-purpose, utility clusters based on the concept of user requested job completion deadlines. In such a system, a user would be able to request each job to finish by a certain deadline. and possibly to a certain monetary cost. Implementing deadline scheduling is dependent on the ability to predict the execution time of each queued job. and on an adaptive scheduling algorithm able to use those predictions to maximise deadline adherence. The thesis proposes novel solutions to these two problems and documents their implementation in a largely autonomous and self-managing way. The starting point of the work is an extensive analysis of a representative Grid workload revealing consistent workflow patterns, usage cycles and correlations between the execution times of jobs and its properties commonly collected by the Grid middleware for accounting purposes. An automated approach is proposed to identify these dependencies and use them to partition the highly variable workload into subsets of more consistent and predictable behaviour. A range of time-series forecasting models, applied in this context for the first time, were used to model the job execution times as a function of their historical behaviour and associated properties. Based on the resulting predictions of job runtimes a novel scheduling algorithm is able to estimate the latest job start time necessary to meet the requested deadline and sort the queue accordingly to minimise the amount of deadline overrun. The testing of the proposed approach was done using the actual job trace collected from a production Grid facility. The best performing execution time predictor (the auto-regressive moving average method) coupled to workload partitioning based on three simultaneous job properties returned the median absolute percentage error eentroid of only 4.75CX. This level of prediction accuracy enabled the proposed deadline scheduling method to reduce the average deadline overrun time ten-fold compared to the benchmark batch scheduler. Overall, the thesis demonstrates that deadline scheduling of computational jobs on the Grid is achievable using statistical forecasting of job execution times based on historical information. The proposed approach is easily implementable, substantially self-managing and better matched to the human workflow making it well suited for implementation in the utility Grids of the future

UCL Discovery

Automatic Node Selection for High Performance Applications on Networks

Author: Bruce Lowekamp
Jaspal Subhlok
Peter Lieu
Publication venue: ACM Press
Publication date: 01/01/1999
Field of study

A central problem in executing performance critical parallel and distributed applications on shared networks is the selection of computation nodes and communication paths for execution. Automatic selection of nodes is complex as the best choice depends on the application structure as well as the expected availability of computation and communication resources. This paper presents a solution to this problem for realistic application and network scenarios. A new algorithm to jointly analyze computation and communication resources for different application demands is introduced and a framework for automatic node selection is developed on top of Remos, which is a query interface to network information. The paper reports results from a set of applications, including Airshed pollution modeling and magnetic resonance imaging, executing on a high speed network testbed. The results demonstrate that node selection is effective in enhancing application performance in the presence of computation load as well as network traffic. Under the network conditions used for experiments, the increase in execution time due to compute loads and network congestion was reduced by half with node selection. The node selection algorithms developed in this research are also applicable to dynamic migration of long running jobs

CiteSeerX

Crossref