Search CORE

2,338 research outputs found

Data mining and fusion

Author: Addis M. J.
Choi F.
Taylor S. J.
Upstill C.
Watkins E. R.
Publication venue: s.n.
Publication date: 01/04/2006
Field of study

Simulation of the performance of complex data-intensive workflows

Author: Llwaah Faris Adel Dawood
Publication venue: Newcastle University
Publication date: 01/01/2018
Field of study

PhD ThesisRecently, cloud computing has been used for analytical and data-intensive processes as it offers many attractive features, including resource pooling, on-demand capability and rapid elasticity. Scientific workflows use these features to tackle the problems of complex data-intensive applications. Data-intensive workflows are composed of many tasks that may involve large input data sets and produce large amounts of data as output, which typically runs in highly dynamic environments. However, the resources should be allocated dynamically depending on the demand changes of the work flow, as over-provisioning increases the cost and under-provisioning causes Service Level Agreement (SLA) violation and poor Quality of Service (QoS). Performance prediction of complex workflows is a necessary step prior to the deployment of the workflow. Performance analysis of complex data-intensive workflows is a challenging task due to the complexity of their structure, diversity of big data, and data dependencies, in addition to the required examination to the performance and challenges associated with running their workflows in the real cloud. In this thesis, a solution is explored to address these challenges, using a Next Generation Sequencing (NGS) workflow pipeline as a case study, which may require hundreds/ thousands of CPU hours to process a terabyte of data. We propose a methodology to model, simulate and predict runtime and the number of resources used by the complex data-intensive workflows. One contribution of our simulation methodology is that it provides an ability to extract the simulation parameters (e.g., MIPs and BW values) that are required for constructing a training set and a fairly accurate prediction of the run time for input for cluster sizes much larger than ones used in training of the prediction model. The proposed methodology permits the derivation of run time prediction based on historical data from the provenance fi les. We present the run time prediction of the complex workflow by considering different cases of its running in the cloud such as execution failure and library deployment time. In case of failure, the framework can apply the prediction only partially considering the successful parts of the pipeline, in the other case the framework can predict with or without considering the time to deploy libraries. To further improve the accuracy of prediction, we propose a simulation model that handles I/O contention

Newcastle University eTheses

CERN openlab Whitepaper on Future IT Challenges in Scientific Research

Author: Di Meglio Alberto
Gaillard Melissa
Purcell Andrew
Publication venue
Publication date: 01/01/2014
Field of study

This whitepaper describes the major IT challenges in scientific research at CERN and several other European and international research laboratories and projects. Each challenge is exemplified through a set of concrete use cases drawn from the requirements of large-scale scientific programs. The paper is based on contributions from many researchers and IT experts of the participating laboratories and also input from the existing CERN openlab industrial sponsors. The views expressed in this document are those of the individual contributors and do not necessarily reflect the view of their organisations and/or affiliates

ZENODO

CERN Document Server