526 research outputs found

    Executing Large Scale Scientific Workflows in Public Clouds

    Get PDF
    Scientists in different fields, such as high-energy physics, earth science, and astronomy are developing large-scale workflow applications. In many use cases, scientists need to run a set of interrelated but independent workflows (i.e., workflow ensembles) for the entire scientific analysis. As a workflow ensemble usually contains many sub-workflows in each of which hundreds or thousands of jobs exist with precedence constraints, the execution of such a workflow ensemble makes a great concern with cost even using elastic and pay-as-you-go cloud resources. In this thesis, we develop a set of methods to optimize the execution of large-scale scientific workflows in public clouds with both cost and deadline constraints with a two-step approach. Firstly, we present a set of methods to optimize the execution of scientific workflow in public clouds, with the Montage astronomical mosaic engine running on Amazon EC2 as an example. Secondly, we address three main challenges in realizing benefits of using public clouds when executing large-scale workflow ensembles: (1) execution coordination, (2) resource provisioning, and (3) data staging. To this end, we develop a new pulling-based workflow execution system with a profiling-based resource provisioning strategy. Our results show that our solution system can achieve 80% speed-up, by removing scheduling overhead, compared to the well-known Pegasus workflow management system when running scientific workflow ensembles. Besides, our evaluation using Montage workflow ensembles on around 1000-core Amazon EC2 clusters has demonstrated the efficacy of our resource provisioning strategy in terms of cost effectiveness within deadline

    MOLNs: A cloud platform for interactive, reproducible and scalable spatial stochastic computational experiments in systems biology using PyURDME

    Full text link
    Computational experiments using spatial stochastic simulations have led to important new biological insights, but they require specialized tools, a complex software stack, as well as large and scalable compute and data analysis resources due to the large computational cost associated with Monte Carlo computational workflows. The complexity of setting up and managing a large-scale distributed computation environment to support productive and reproducible modeling can be prohibitive for practitioners in systems biology. This results in a barrier to the adoption of spatial stochastic simulation tools, effectively limiting the type of biological questions addressed by quantitative modeling. In this paper, we present PyURDME, a new, user-friendly spatial modeling and simulation package, and MOLNs, a cloud computing appliance for distributed simulation of stochastic reaction-diffusion models. MOLNs is based on IPython and provides an interactive programming platform for development of sharable and reproducible distributed parallel computational experiments

    Provendo robustez a escalonadores de workflows sensíveis às incertezas da largura de banda disponível

    Get PDF
    Orientadores: Edmundo Roberto Mauro Madeira, Luiz Fernando BittencourtTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Para que escalonadores de aplicações científicas modeladas como workflows derivem escalonamentos eficientes em nuvens híbridas, é necessário que se forneçam, além da descrição da demanda computacional desses aplicativos, as informações sobre o poder de computação dos recursos disponíveis, especialmente aqueles dados relacionados com a largura de banda disponível. Entretanto, a imprecisão das ferramentas de medição fazem com que as informações da largura de banda disponível fornecida aos escalonadores difiram dos valores reais que deveriam ser considerados para se obter escalonamentos quase ótimos. Escalonadores especialmente projetados para nuvens híbridas simplesmente ignoram a existência de tais imprecisões e terminam produzindo escalonamentos enganosos e de baixo desempenho, o que os tornam sensíveis às informações incertas. A presente Tese introduz um procedimento pró-ativo para fornecer um certo nível de robustez a escalonamentos derivados de escalonadores não projetados para serem robustos frente às incertezas decorrentes do uso de informações imprecisas dadas por ferramentas de medições de rede. Para tornar os escalonamentos sensíveis às incertezas em escalonamentos robustos às essas imprecisões, o procedimento propõe um refinamento (uma deflação) das estimativas da largura de banda antes de serem utilizadas pelo escalonador não robusto. Ao propor o uso de estimativas refinadas da largura de banda disponível, escalonadores inicialmente sensíveis às incertezas passaram a produzir escalonamentos com um certo nível de robustez às essas imprecisões. A eficácia e a eficiência do procedimento proposto são avaliadas através de simulação. Comparam-se, portanto, os escalonamentos gerados por escalonadores que passaram a usar o procedimento proposto com aqueles produzidos pelos mesmos escalonadores mas sem aplicar esse procedimento. Os resultados das simulações mostram que o procedimento proposto é capaz de prover robustez às incertezas da informação da largura de banda a escalonamentos derivados de escalonardes não robustos às tais incertezas. Adicionalmente, esta Tese também propõe um escalonador de aplicações científicas especialmente compostas por um conjunto de workflows. A novidade desse escalonador é que ele é flexível, ou seja, permite o uso de diferentes categorias de funções objetivos. Embora a flexibilidade proposta seja uma novidade no estado da arte, esse escalonador também é sensível às imprecisões da largura de banda. Entretanto, o procedimento mostrou-se capaz de provê-lo de robustez frente às tais incertezas. É mostrado nesta Tese que o procedimento proposto aumentou a eficácia e a eficiência de escalonadores de workflows não robustos projetados para nuvens híbridas, já que eles passaram a produzir escalonamentos com um certo nível de robustez na presença de estimativas incertas da largura de banda disponível. Dessa forma, o procedimento proposto nesta Tese é uma importante ferramenta para aprimorar os escalonadores sensíveis às estimativas incertas da banda disponível especialmente projetados para um ambiente computacional onde esses valores são imprecisos por natureza. Portanto, esta Tese propõe um procedimento que promove melhorias nas execuções de aplicações científicas em nuvens híbridasAbstract: To derive efficient schedules for the tasks of scientific applications modelled as workflows, schedulers need information on the application demands as well as on the resource availability, especially those regarding the available bandwidth. However, the lack of precision of bandwidth estimates provided by monitoring/measurement tools should be considered by the scheduler to achieve near-optimal schedules. Uncertainties of available bandwidth can be a result of imprecise measurement and monitoring network tools and/or their incapacity of estimating in advance the real value of the available bandwidth expected for the application during the scheduling step of the application. Schedulers specially designed for hybrid clouds simply ignore the inaccuracies of the given estimates and end up producing non-robust, low-performance schedules, which makes them sensitive to the uncertainties stemming from using these networking tools. This thesis introduces a proactive procedure to provide a certain level of robustness for schedules derived from schedulers that were not designed to be robust in the face of uncertainties of bandwidth estimates stemming from using unreliable networking tools. To make non-robust schedulers into robust schedulers, the procedure applies a deflation on imprecise bandwidth estimates before being used as input to non-robust schedulers. By proposing the use of refined (deflated) estimates of the available bandwidth, non-robust schedulers initially sensitive to these uncertainties started to produce robust schedules that are insensitive to these inaccuracies. The effectiveness and efficiency of the procedure in providing robustness to non-robust schedulers are evaluated through simulation. Schedules generated by induced-robustness schedulers through the use of the procedure is compared to that of produced by sensitive schedulers. In addition, this thesis also introduces a flexible scheduler for a special case of scientific applications modelled as a set of workflows grouped into ensembles. Although the novelty of this scheduler is the replacement of objective functions according to the user's needs, it is still a non-robust scheduler. However, the procedure was able to provide the necessary robustness for this flexible scheduler be able to produce robust schedules under uncertain bandwidth estimates. It is shown in this thesis that the proposed procedure enhanced the robustness of workflow schedulers designed especially for hybrid clouds as they started to produce robust schedules in the presence of uncertainties stemming from using networking tools. The proposed procedure is an important tool to furnish robustness to non-robust schedulers that are originally designed to work in a computational environment where bandwidth estimates are very likely to vary and cannot be estimated precisely in advance, bringing, therefore, improvements to the executions of scientific applications in hybrid cloudsDoutoradoCiência da ComputaçãoDoutor em Ciência da Computação2012/02778-6FAPES

    The Contemporary Affirmation of Taxonomy and Recent Literature on Workflow Scheduling and Management in Cloud Computing

    Get PDF
    The Cloud computing systemspreferred over the traditional forms of computing such as grid computing, utility computing, autonomic computing is attributed forits ease of access to computing, for its QoS preferences, SLA2019;s conformity, security and performance offered with minimal supervision. A cloud workflow schedule when designed efficiently achieves optimalre source sage, balance of workloads, deadline specific execution, cost control according to budget specifications, efficient consumption of energy etc. to meet the performance requirements of today2019; svast scientific and business requirements. The businesses requirements under recent technologies like pervasive computing are motivating the technology of cloud computing for further advancements. In this paper we discuss some of the important literature published on cloud workflow scheduling

    Workflow Scheduling Techniques and Algorithms in IaaS Cloud: A Survey

    Get PDF
    In the modern era, workflows are adopted as a powerful and attractive paradigm for expressing/solving a variety of applications like scientific, data intensive computing, and big data applications such as MapReduce and Hadoop. These complex applications are described using high-level representations in workflow methods. With the emerging model of cloud computing technology, scheduling in the cloud becomes the important research topic. Consequently, workflow scheduling problem has been studied extensively over the past few years, from homogeneous clusters, grids to the most recent paradigm, cloud computing. The challenges that need to be addressed lies in task-resource mapping, QoS requirements, resource provisioning, performance fluctuation, failure handling, resource scheduling, and data storage. This work focuses on the complete study of the resource provisioning and scheduling algorithms in cloud environment focusing on Infrastructure as a service (IaaS). We provided a comprehensive understanding of existing scheduling techniques and provided an insight into research challenges that will be a possible future direction to the researchers

    Large-scale binding affinity calculations on commodity compute clouds

    Get PDF
    In recent years, it has become possible to calculate binding affinities of compounds bound to proteins via rapid, accurate, precise and reproducible free energy calculations. This is imperative in drug discovery as well as personalized medicine. This approach is based on molecular dynamics (MD) simulations and draws on sequence and structural information of the protein and compound concerned. Free energies are determined by ensemble averages of many MD replicas, each of which requires hundreds of cores and/or GPU accelerators, which are now available on commodity cloud computing platforms; there are also requirements for initial model building and subsequent data analysis stages. To automate the process, we have developed a workflow known as the binding affinity calculator. In this paper, we focus on the software infrastructure and interfaces that we have developed to automate the overall workflow and execute it on commodity cloud platforms, in order to reliably predict their binding affinities on time scales relevant to the domains of application, and illustrate its application to two free energy methods
    corecore