192 research outputs found

    Different aspects of workflow scheduling in large-scale distributed systems

    Get PDF
    As large-scale distributed systems gain momentum, the scheduling of workflow applications with multiple requirements in such computing platforms has become a crucial area of research. In this paper, we investigate the workflow scheduling problem in large-scale distributed systems, from the Quality of Service (QoS) and data locality perspectives. We present a scheduling approach, considering two models of synchronization for the tasks in a workflow application: (a) communication through the network and (b) communication through temporary files. Specifically, we investigate via simulation the performance of a heterogeneous distributed system, where multiple soft real-time workflow applications arrive dynamically. The applications are scheduled under various tardiness bounds, taking into account the communication cost in the first case study and the I/O cost and data locality in the second.The work presented in this paper has been partially supported by EU, under the COST program Action IC1305, “Network for Sustainable Ultrascale Computing (NESUS)”, and by the Ministerio de Economía y Competitividad, Spain, under the project TIN2013-41350-P, “Scalable Data Management Techniques for High-End Computing Systems”

    A characterization of workflow management systems for extreme-scale applications

    Get PDF
    Automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compelling case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. The paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems

    Resource boxing: Converting realistic cloud task utilization patterns for theoretical scheduling

    Get PDF
    Scheduling is a core component within distributed systems to determine optimal allocation of tasks within servers. This is challenging within modern Cloud computing systems - comprising millions of tasks executing in thousands of heterogeneous servers. Theoretical scheduling is capable of providing complete and sophisticated algorithms towards a single objective function. However, Cloud computing systems pursue multiple and oftentimes conflicting objectives towards provisioning high levels of performance, availability, reliability and energy-efficiency. As a result, theoretical scheduling for Cloud computing is performed by simplifying assumptions for applicability. This is especially true for task utilization patterns, which fluctuate in practice yet are modelled as piecewise constant in theoretical scheduling models. While there exists work for modelling dynamic Cloud task patterns for evaluating applied scheduling, such models are incompatible with the inputs needed for theoretical scheduling - which require such patterns to be represented as boxes. Presently there exist no methods capable of accurately converting real task patterns derived from empirical data into boxes. This results in a significant gap towards theoreticians understanding and proposing algorithms derived from realistic assumptions towards enhanced Cloud scheduling. This work proposes resource boxing - an approach for automated conversion of realistic task patterns in Cloud computing directly into box-inputs for theoretical scheduling. We propose four resource conversion algorithms capable of accurately representing real task utilization patterns in the form of scheduling boxes. Algorithms were evaluated using production Cloud trace data, demonstrating a difference between real utilization and scheduling boxes less than 5%. We also provide an application for how resource boxing can be exploited to directly translate research from the applied community into the theoretical community

    A survey of the European Open Science Cloud services for expanding the capacity and capabilities of multidisciplinary scientific applications

    Get PDF
    Open Science is a paradigm in which scientific data, procedures, tools and results are shared transparently and reused by society. The European Open Science Cloud (EOSC) initiative is an effort in Europe to provide an open, trusted, virtual and federated computing environment to execute scientific applications and store, share and reuse research data across borders and scientific disciplines. Additionally, scientific services are becoming increasingly data-intensive, not only in terms of computationally intensive tasks but also in terms of storage resources. To meet those resource demands, computing paradigms such as High-Performance Computing (HPC) and Cloud Computing are applied to e-science applications. However, adapting applications and services to these paradigms is a challenging task, commonly requiring a deep knowledge of the underlying technologies, which often constitutes a general barrier to its uptake by scientists. In this context, EOSC-Synergy, a collaborative project involving more than 20 institutions from eight European countries pooling their knowledge and experience to enhance EOSC’s capabilities and capacities, aims to bring EOSC closer to the scientific communities. This article provides a summary analysis of the adaptations made in the ten thematic services of EOSC-Synergy to embrace this paradigm. These services are grouped into four categories: Earth Observation, Environment, Biomedicine, and Astrophysics. The analysis will lead to the identification of commonalities, best practices and common requirements, regardless of the thematic area of the service. Experience gained from the thematic services can be transferred to new services for the adoption of the EOSC ecosystem framework. The article made several recommendations for the integration of thematic services in the EOSC ecosystem regarding Authentication and Authorization (federated regional or thematic solutions based on EduGAIN mainly), FAIR data and metadata preservation solutions (both at cataloguing and data preservation—such as EUDAT’s B2SHARE), cloud platform-agnostic resource management services (such as Infrastructure Manager) and workload management solutions.This work was supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 857647, EOSC-Synergy, European Open Science Cloud - Expanding Capacities by building Capabilities. Moreover, this work is partially funded by grant No 2015/24461-2, São Paulo Research Foundation (FAPESP). Francisco Brasileiro is a CNPq/Brazil researcher (grant 308027/2020-5).Peer Reviewed"Article signat per 20 autors/es: Amanda Calatrava, Hernán Asorey, Jan Astalos, Alberto Azevedo, Francesco Benincasa, Ignacio Blanquer, Martin Bobak, Francisco Brasileiro, Laia Codó, Laura del Cano, Borja Esteban, Meritxell Ferret, Josef Handl, Tobias Kerzenmacher, Valentin Kozlov, Aleơ Kƙenek, Ricardo Martins, Manuel Pavesio, Antonio Juan Rubio-Montero, Juan Sánchez-Ferrero "Postprint (published version

    Enabling dynamic and intelligent workflows for HPC, data analytics, and AI convergence

    Get PDF
    The evolution of High-Performance Computing (HPC) platforms enables the design and execution of progressively larger and more complex workflow applications in these systems. The complexity comes not only from the number of elements that compose the workflows but also from the type of computations they perform. While traditional HPC workflows target simulations and modelling of physical phenomena, current needs require in addition data analytics (DA) and artificial intelligence (AI) tasks. However, the development of these workflows is hampered by the lack of proper programming models and environments that support the integration of HPC, DA, and AI, as well as the lack of tools to easily deploy and execute the workflows in HPC systems. To progress in this direction, this paper presents use cases where complex workflows are required and investigates the main issues to be addressed for the HPC/DA/AI convergence. Based on this study, the paper identifies the challenges of a new workflow platform to manage complex workflows. Finally, it proposes a development approach for such a workflow platform addressing these challenges in two directions: first, by defining a software stack that provides the functionalities to manage these complex workflows; and second, by proposing the HPC Workflow as a Service (HPCWaaS) paradigm, which leverages the software stack to facilitate the reusability of complex workflows in federated HPC infrastructures. Proposals presented in this work are subject to study and development as part of the EuroHPC eFlows4HPC project.This work has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 955558. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and Spain, Germany, France, Italy, Poland, Switzerland and Norway. In Spain, it has received complementary funding from MCIN/AEI/10.13039/501100011033, Spain and the European Union NextGenerationEU/PRTR (contracts PCI2021-121957, PCI2021-121931, PCI2021-121944, and PCI2021-121927). In Germany, it has received complementary funding from the German Federal Ministry of Education and Research (contracts 16HPC016K, 6GPC016K, 16HPC017 and 16HPC018). In France, it has received financial support from Caisse des dĂ©pĂŽts et consignations (CDC) under the action PIA ADEIP (project Calculateurs). In Italy, it has been preliminary approved for complimentary funding by Ministero dello Sviluppo Economico (MiSE) (ref. project prop. 2659). In Norway, it has received complementary funding from the Norwegian Research Council, Norway under project number 323825. In Switzerland, it has been preliminary approved for complimentary funding by the State Secretariat for Education, Research, and Innovation (SERI), Norway. In Poland, it is partially supported by the National Centre for Research and Development under decision DWM/EuroHPCJU/4/2021. The authors also acknowledge financial support by MCIN/AEI /10.13039/501100011033, Spain through the “Severo Ochoa Programme for Centres of Excellence in R&D” under Grant CEX2018-000797-S, the Spanish Government, Spain (contract PID2019-107255 GB) and by Generalitat de Catalunya, Spain (contract 2017-SGR-01414). Anna Queralt is a Serra HĂșnter Fellow.With funding from the Spanish government through the ‘Severo Ochoa Centre of Excellence’ accreditation (CEX2018-000797-S)
    • 

    corecore