18,825 research outputs found
Task Runtime Prediction in Scientific Workflows Using an Online Incremental Learning Approach
Many algorithms in workflow scheduling and resource provisioning rely on the
performance estimation of tasks to produce a scheduling plan. A profiler that
is capable of modeling the execution of tasks and predicting their runtime
accurately, therefore, becomes an essential part of any Workflow Management
System (WMS). With the emergence of multi-tenant Workflow as a Service (WaaS)
platforms that use clouds for deploying scientific workflows, task runtime
prediction becomes more challenging because it requires the processing of a
significant amount of data in a near real-time scenario while dealing with the
performance variability of cloud resources. Hence, relying on methods such as
profiling tasks' execution data using basic statistical description (e.g.,
mean, standard deviation) or batch offline regression techniques to estimate
the runtime may not be suitable for such environments. In this paper, we
propose an online incremental learning approach to predict the runtime of tasks
in scientific workflows in clouds. To improve the performance of the
predictions, we harness fine-grained resources monitoring data in the form of
time-series records of CPU utilization, memory usage, and I/O activities that
are reflecting the unique characteristics of a task's execution. We compare our
solution to a state-of-the-art approach that exploits the resources monitoring
data based on regression machine learning technique. From our experiments, the
proposed strategy improves the performance, in terms of the error, up to
29.89%, compared to the state-of-the-art solutions.Comment: Accepted for presentation at main conference track of 11th IEEE/ACM
International Conference on Utility and Cloud Computin
A Taxonomy of Workflow Management Systems for Grid Computing
With the advent of Grid and application technologies, scientists and
engineers are building more and more complex applications to manage and process
large data sets, and execute scientific experiments on distributed resources.
Such application scenarios require means for composing and executing complex
workflows. Therefore, many efforts have been made towards the development of
workflow management systems for Grid computing. In this paper, we propose a
taxonomy that characterizes and classifies various approaches for building and
executing workflows on Grids. We also survey several representative Grid
workflow systems developed by various projects world-wide to demonstrate the
comprehensiveness of the taxonomy. The taxonomy not only highlights the design
and engineering similarities and differences of state-of-the-art in Grid
workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure
The evolution of bits and bottlenecks in a scientific workflow trying to keep up with technology: Accelerating 4D image segmentation applied to nasa data
In 2016, a team of earth scientists directly engaged a team of computer scientists to identify cyberinfrastructure (CI) approaches that would speed up an earth science workflow. This paper describes the evolution of that workflow as the two teams bridged CI and an image segmentation algorithm to do large scale earth science research. The Pacific Research Platform (PRP) and The Cognitive Hardware and Software Ecosystem Community Infrastructure (CHASE-CI) resources were used to significantly decreased the earth science workflow's wall-clock time from 19.5 days to 53 minutes. The improvement in wall-clock time comes from the use of network appliances, improved image segmentation, deployment of a containerized workflow, and the increase in CI experience and training for the earth scientists. This paper presents a description of the evolving innovations used to improve the workflow, bottlenecks identified within each workflow version, and improvements made within each version of the workflow, over a three-year time period
Towards distributed architecture for collaborative cloud services in community networks
Internet and communication technologies have lowered the costs for communities to collaborate, leading to new services like user-generated content and social computing, and through collaboration, collectively built infrastructures like community networks have also emerged. Community networks get formed when individuals and local organisations from a geographic area team up to create and run a community-owned IP network to satisfy the community’s demand for ICT, such as facilitating Internet access and providing services of local interest.
The consolidation of today’s cloud technologies offers now the possibility of collectively built community clouds, building upon user-generated content and user-provided networks towards an ecosystem of cloud services. To address the limitation and enhance utility of community networks, we propose a collaborative distributed architecture for building a community cloud system that employs resources contributed by the members of the community network for provisioning infrastructure and software services. Such architecture needs to be tailored to the specific social, economic and technical characteristics of the community networks for community clouds to be successful and sustainable. By real deployments of clouds in community networks and evaluation of application performance, we show that community clouds are feasible. Our result may encourage collaborative innovative cloud-based services made possible with the resources of a community.Peer ReviewedPostprint (author’s final draft
Report of the user requirements and web based access for eResearch workshops
The User Requirements and Web Based Access for eResearch Workshop, organized jointly by NeSC and NCeSS, was held on 19 May 2006. The aim was to identify lessons learned from e-Science projects that would contribute to our capacity to make Grid infrastructures and tools usable and accessible for diverse user communities. Its focus was on providing an opportunity for a pragmatic discussion between e-Science end users
and tool builders in order to understand usability challenges, technological options, community-specific content and needs, and methodologies for design and development. We invited members of six UK e-Science projects and one US project, trying as far as
possible to pair a user and developer from each project in order to discuss their contrasting perspectives and experiences. Three breakout group sessions covered the
topics of user-developer relations, commodification, and functionality. There was also extensive post-meeting discussion, summarized here.
Additional information on the workshop, including the agenda, participant list, and talk slides, can be found online at http://www.nesc.ac.uk/esi/events/685/
Reference: NeSC report UKeS-2006-07 available from http://www.nesc.ac.uk/technical_papers/UKeS-2006-07.pd
Learning Heterogeneous Similarity Measures for Hybrid-Recommendations in Meta-Mining
The notion of meta-mining has appeared recently and extends the traditional
meta-learning in two ways. First it does not learn meta-models that provide
support only for the learning algorithm selection task but ones that support
the whole data-mining process. In addition it abandons the so called black-box
approach to algorithm description followed in meta-learning. Now in addition to
the datasets, algorithms also have descriptors, workflows as well. For the
latter two these descriptions are semantic, describing properties of the
algorithms. With the availability of descriptors both for datasets and data
mining workflows the traditional modelling techniques followed in
meta-learning, typically based on classification and regression algorithms, are
no longer appropriate. Instead we are faced with a problem the nature of which
is much more similar to the problems that appear in recommendation systems. The
most important meta-mining requirements are that suggestions should use only
datasets and workflows descriptors and the cold-start problem, e.g. providing
workflow suggestions for new datasets.
In this paper we take a different view on the meta-mining modelling problem
and treat it as a recommender problem. In order to account for the meta-mining
specificities we derive a novel metric-based-learning recommender approach. Our
method learns two homogeneous metrics, one in the dataset and one in the
workflow space, and a heterogeneous one in the dataset-workflow space. All
learned metrics reflect similarities established from the dataset-workflow
preference matrix. We demonstrate our method on meta-mining over biological
(microarray datasets) problems. The application of our method is not limited to
the meta-mining problem, its formulations is general enough so that it can be
applied on problems with similar requirements
- …