18,825 research outputs found

    Task Runtime Prediction in Scientific Workflows Using an Online Incremental Learning Approach

    Full text link
    Many algorithms in workflow scheduling and resource provisioning rely on the performance estimation of tasks to produce a scheduling plan. A profiler that is capable of modeling the execution of tasks and predicting their runtime accurately, therefore, becomes an essential part of any Workflow Management System (WMS). With the emergence of multi-tenant Workflow as a Service (WaaS) platforms that use clouds for deploying scientific workflows, task runtime prediction becomes more challenging because it requires the processing of a significant amount of data in a near real-time scenario while dealing with the performance variability of cloud resources. Hence, relying on methods such as profiling tasks' execution data using basic statistical description (e.g., mean, standard deviation) or batch offline regression techniques to estimate the runtime may not be suitable for such environments. In this paper, we propose an online incremental learning approach to predict the runtime of tasks in scientific workflows in clouds. To improve the performance of the predictions, we harness fine-grained resources monitoring data in the form of time-series records of CPU utilization, memory usage, and I/O activities that are reflecting the unique characteristics of a task's execution. We compare our solution to a state-of-the-art approach that exploits the resources monitoring data based on regression machine learning technique. From our experiments, the proposed strategy improves the performance, in terms of the error, up to 29.89%, compared to the state-of-the-art solutions.Comment: Accepted for presentation at main conference track of 11th IEEE/ACM International Conference on Utility and Cloud Computin

    A Taxonomy of Workflow Management Systems for Grid Computing

    Full text link
    With the advent of Grid and application technologies, scientists and engineers are building more and more complex applications to manage and process large data sets, and execute scientific experiments on distributed resources. Such application scenarios require means for composing and executing complex workflows. Therefore, many efforts have been made towards the development of workflow management systems for Grid computing. In this paper, we propose a taxonomy that characterizes and classifies various approaches for building and executing workflows on Grids. We also survey several representative Grid workflow systems developed by various projects world-wide to demonstrate the comprehensiveness of the taxonomy. The taxonomy not only highlights the design and engineering similarities and differences of state-of-the-art in Grid workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure

    The evolution of bits and bottlenecks in a scientific workflow trying to keep up with technology: Accelerating 4D image segmentation applied to nasa data

    Get PDF
    In 2016, a team of earth scientists directly engaged a team of computer scientists to identify cyberinfrastructure (CI) approaches that would speed up an earth science workflow. This paper describes the evolution of that workflow as the two teams bridged CI and an image segmentation algorithm to do large scale earth science research. The Pacific Research Platform (PRP) and The Cognitive Hardware and Software Ecosystem Community Infrastructure (CHASE-CI) resources were used to significantly decreased the earth science workflow's wall-clock time from 19.5 days to 53 minutes. The improvement in wall-clock time comes from the use of network appliances, improved image segmentation, deployment of a containerized workflow, and the increase in CI experience and training for the earth scientists. This paper presents a description of the evolving innovations used to improve the workflow, bottlenecks identified within each workflow version, and improvements made within each version of the workflow, over a three-year time period

    Towards distributed architecture for collaborative cloud services in community networks

    Get PDF
    Internet and communication technologies have lowered the costs for communities to collaborate, leading to new services like user-generated content and social computing, and through collaboration, collectively built infrastructures like community networks have also emerged. Community networks get formed when individuals and local organisations from a geographic area team up to create and run a community-owned IP network to satisfy the community’s demand for ICT, such as facilitating Internet access and providing services of local interest. The consolidation of today’s cloud technologies offers now the possibility of collectively built community clouds, building upon user-generated content and user-provided networks towards an ecosystem of cloud services. To address the limitation and enhance utility of community networks, we propose a collaborative distributed architecture for building a community cloud system that employs resources contributed by the members of the community network for provisioning infrastructure and software services. Such architecture needs to be tailored to the specific social, economic and technical characteristics of the community networks for community clouds to be successful and sustainable. By real deployments of clouds in community networks and evaluation of application performance, we show that community clouds are feasible. Our result may encourage collaborative innovative cloud-based services made possible with the resources of a community.Peer ReviewedPostprint (author’s final draft

    Report of the user requirements and web based access for eResearch workshops

    Get PDF
    The User Requirements and Web Based Access for eResearch Workshop, organized jointly by NeSC and NCeSS, was held on 19 May 2006. The aim was to identify lessons learned from e-Science projects that would contribute to our capacity to make Grid infrastructures and tools usable and accessible for diverse user communities. Its focus was on providing an opportunity for a pragmatic discussion between e-Science end users and tool builders in order to understand usability challenges, technological options, community-specific content and needs, and methodologies for design and development. We invited members of six UK e-Science projects and one US project, trying as far as possible to pair a user and developer from each project in order to discuss their contrasting perspectives and experiences. Three breakout group sessions covered the topics of user-developer relations, commodification, and functionality. There was also extensive post-meeting discussion, summarized here. Additional information on the workshop, including the agenda, participant list, and talk slides, can be found online at http://www.nesc.ac.uk/esi/events/685/ Reference: NeSC report UKeS-2006-07 available from http://www.nesc.ac.uk/technical_papers/UKeS-2006-07.pd

    Learning Heterogeneous Similarity Measures for Hybrid-Recommendations in Meta-Mining

    Get PDF
    The notion of meta-mining has appeared recently and extends the traditional meta-learning in two ways. First it does not learn meta-models that provide support only for the learning algorithm selection task but ones that support the whole data-mining process. In addition it abandons the so called black-box approach to algorithm description followed in meta-learning. Now in addition to the datasets, algorithms also have descriptors, workflows as well. For the latter two these descriptions are semantic, describing properties of the algorithms. With the availability of descriptors both for datasets and data mining workflows the traditional modelling techniques followed in meta-learning, typically based on classification and regression algorithms, are no longer appropriate. Instead we are faced with a problem the nature of which is much more similar to the problems that appear in recommendation systems. The most important meta-mining requirements are that suggestions should use only datasets and workflows descriptors and the cold-start problem, e.g. providing workflow suggestions for new datasets. In this paper we take a different view on the meta-mining modelling problem and treat it as a recommender problem. In order to account for the meta-mining specificities we derive a novel metric-based-learning recommender approach. Our method learns two homogeneous metrics, one in the dataset and one in the workflow space, and a heterogeneous one in the dataset-workflow space. All learned metrics reflect similarities established from the dataset-workflow preference matrix. We demonstrate our method on meta-mining over biological (microarray datasets) problems. The application of our method is not limited to the meta-mining problem, its formulations is general enough so that it can be applied on problems with similar requirements
    • …
    corecore