3,195 research outputs found

    rDLB: A Novel Approach for Robust Dynamic Load Balancing of Scientific Applications with Parallel Independent Tasks

    Full text link
    Scientific applications often contain large and computationally intensive parallel loops. Dynamic loop self scheduling (DLS) is used to achieve a balanced load execution of such applications on high performance computing (HPC) systems. Large HPC systems are vulnerable to processors or node failures and perturbations in the availability of resources. Most self-scheduling approaches do not consider fault-tolerant scheduling or depend on failure or perturbation detection and react by rescheduling failed tasks. In this work, a robust dynamic load balancing (rDLB) approach is proposed for the robust self scheduling of independent tasks. The proposed approach is proactive and does not depend on failure or perturbation detection. The theoretical analysis of the proposed approach shows that it is linearly scalable and its cost decrease quadratically by increasing the system size. rDLB is integrated into an MPI DLS library to evaluate its performance experimentally with two computationally intensive scientific applications. Results show that rDLB enables the tolerance of up to (P minus one) processor failures, where P is the number of processors executing an application. In the presence of perturbations, rDLB boosted the robustness of DLS techniques up to 30 times and decreased application execution time up to 7 times compared to their counterparts without rDLB

    Evaluating the Robustness of Resource Allocations Obtained through Performance Modeling with Stochastic Process Algebra

    Get PDF
    Recent developments in the field of parallel and distributed computing has led to a proliferation of solving large and computationally intensive mathematical, science, or engineering problems, that consist of several parallelizable parts and several non-parallelizable (sequential) parts. In a parallel and distributed computing environment, the performance goal is to optimize the execution of parallelizable parts of an application on concurrent processors. This requires efficient application scheduling and resource allocation for mapping applications to a set of suitable parallel processors such that the overall performance goal is achieved. However, such computational environments are often prone to unpredictable variations in application (problem and algorithm) and system characteristics. Therefore, a robustness study is required to guarantee a desired level of performance. Given an initial workload, a mapping of applications to resources is considered to be robust if that mapping optimizes execution performance and guarantees a desired level of performance in the presence of unpredictable perturbations at runtime. In this research, a stochastic process algebra, Performance Evaluation Process Algebra (PEPA), is used for obtaining resource allocations via a numerical analysis of performance modeling of the parallel execution of applications on parallel computing resources. The PEPA performance model is translated into an underlying mathematical Markov chain model for obtaining performance measures. Further, a robustness analysis of the allocation techniques is performed for finding a robustmapping from a set of initial mapping schemes. The numerical analysis of the performance models have confirmed similarity with the simulation results of earlier research available in existing literature. When compared to direct experiments and simulations, numerical models and the corresponding analyses are easier to reproduce, do not incur any setup or installation costs, do not impose any prerequisites for learning a simulation framework, and are not limited by the complexity of the underlying infrastructure or simulation libraries

    SiL: An Approach for Adjusting Applications to Heterogeneous Systems Under Perturbations

    Full text link
    Scientific applications consist of large and computationally-intensive loops. Dynamic loop scheduling (DLS) techniques are used to load balance the execution of such applications. Load imbalance can be caused by variations in loop iteration execution times due to problem, algorithmic, or systemic characteristics (also, perturbations). The following question motivates this work: "Given an application, a high-performance computing (HPC) system, and both their characteristics and interplay, which DLS technique will achieve improved performance under unpredictable perturbations?" Existing work only considers perturbations caused by variations in the HPC system delivered computational speeds. However, perturbations in available network bandwidth or latency are inevitable on production HPC systems. Simulator in the loop (SiL) is introduced, herein, as a new control-theoretic inspired approach to dynamically select DLS techniques that improve the performance of applications on heterogeneous HPC systems under perturbations. The present work examines the performance of six applications on a heterogeneous system under all above system perturbations. The SiL proof of concept is evaluated using simulation. The performance results confirm the initial hypothesis that no single DLS technique can deliver best performance in all scenarios, while the SiL-based DLS selection delivered improved application performance in most experiments

    On Autonomic HPC Clouds

    Get PDF
    Proceedings of: Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015). Krakow (Poland), September 10-11, 2015.The long tail of science using HPC facilities is looking nowadays to instant available HPC Clouds as a viable alternative to the long waiting queues of supercomputing centers. While the name of HPC Cloud is suggesting a Cloud service, the current HPC-as-a-Service is mainly an offer of bar metal, better named cluster-on-demand. The elasticity and virtualization benefits of the Clouds are not exploited by HPC-as-a-Service. In this paper we discuss how the HPC Cloud offer can be improved from a particular point of view, of automation. After a reminder of the characteristics of the Autonomic Cloud, we project the requirements and expectations to what we name Autonomic HPC Clouds. Finally, we point towards the expected results of the latest research and development activities related to the topics that were identified.The work related to Autonomic HPC Clouds is supported by the European Commission under grant agreement H2020-6643946 (CloudLightning). The CLoudLightning project proposal was prepared by eight partner institutions, three of them as earlier partners in the COST Action IC1305 NESUS, benefiting from its inputs for the proposal. The section related to Autonomic Clouds is supported by the Romanian UEFISCDI under grant agreement PN-II-ID-PCE-2011- 3-0260 (AMICAS)

    Modern software cybernetics: new trends

    Get PDF
    Software cybernetics research is to apply a variety of techniques from cybernetics research to software engineering research. For more than fifteen years since 2001, there has been a dramatic increase in work relating to software cybernetics. From cybernetics viewpoint, the work is mainly on the first-order level, namely, the software under observation and control. Beyond the first-order cybernetics, the software, developers/users, and running environments influence each other and thus create feedback to form more complicated systems. We classify software cybernetics as Software Cybernetics I based on the first-order cybernetics, and as Software Cybernetics II based on the higher order cybernetics. This paper provides a review of the literature on software cybernetics, particularly focusing on the transition from Software Cybernetics I to Software Cybernetics II. The results of the survey indicate that some new research areas such as Internet of Things, big data, cloud computing, cyber-physical systems, and even creative computing are related to Software Cybernetics II. The paper identifies the relationships between the techniques of Software Cybernetics II applied and the new research areas to which they have been applied, formulates research problems and challenges of software cybernetics with the application of principles of Phase II of software cybernetics; identifies and highlights new research trends of software cybernetic for further research

    Networking and Application Interface Technology for Wireless Sensor Network Surveillance and Monitoring

    Get PDF
    Distributed unattended ground sensor networks used in battlefield surveillance and monitoring missions, have proven to be valuable in providing a tactical information advantage required for command and control, intelligence, surveillance, and reconnaissance planning. Operational effectiveness for surveillance missions can be enhanced further through network centric capability, where distributed UGS networks have the ability to perform surveillance operations autonomously. NCC operation can be enhanced through UGSs having the ability to evaluate their awareness of the current joint surveillance environment, in order to provide the necessary adaptation to dynamic changes. NCC can also provide an advantage for UGS networks to self-manage their limited operational resources efficiently, according to mission objective priority. In this article, we present a cross-layer approach and highlight techniques that have potential to enable NCC operation within a mission-orientated UGS surveillance setting

    Modern software cybernetics: New trends

    Get PDF
    The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.Software cybernetics research is to apply a variety of techniques from cybernetics research to software engineering research. For more than fifteen years since 2001, there has been a dramatic increase in work relating to software cybernetics. From cybernetics viewpoint, the work is mainly on the first-order level, namely, the software under observation and control. Beyond the first-order cybernetics, the software, developers/users, and running environments influence each other and thus create feedback to form more complicated systems. We classify software cybernetics as Software Cybernetics I based on the first-order cybernetics, and as Software Cybernetics II based on the higher order cybernetics. This paper provides a review of the literature on software cybernetics, particularly focusing on the transition from Software Cybernetics I to Software Cybernetics II. The results of the survey indicate that some new research areas such as Internet of Things, big data, cloud computing, cyber-physical systems, and even creative computing are related to Software Cybernetics II. The paper identifies the relationships between the techniques of Software Cybernetics II applied and the new research areas to which they have been applied, formulates research problems and challenges of software cybernetics with the application of principles of Phase II of software cybernetics; identifies and highlights new research trends of software cybernetic for further research

    A systematic literature review on the use of artificial intelligence in energy self-management in smart buildings

    Get PDF
    Buildings are one of the main consumers of energy in cities, which is why a lot of research has been generated around this problem. Especially, the buildings energy management systems must improve in the next years. Artificial intelligence techniques are playing and will play a fundamental role in these improvements. This work presents a systematic review of the literature on researches that have been done in recent years to improve energy management systems for smart building using artificial intelligence techniques. An originality of the work is that they are grouped according to the concept of "Autonomous Cycles of Data Analysis Tasks", which defines that an autonomous management system requires specialized tasks, such as monitoring, analysis, and decision-making tasks for reaching objectives in the environment, like improve the energy efficiency. This organization of the work allows us to establish not only the positioning of the researches, but also, the visualization of the current challenges and opportunities in each domain. We have identified that many types of researches are in the domain of decision-making (a large majority on optimization and control tasks), and defined potential projects related to the development of autonomous cycles of data analysis tasks, feature engineering, or multi-agent systems, among others.European Commissio

    Report from GI-Dagstuhl Seminar 16394: Software Performance Engineering in the DevOps World

    Get PDF
    This report documents the program and the outcomes of GI-Dagstuhl Seminar 16394 "Software Performance Engineering in the DevOps World". The seminar addressed the problem of performance-aware DevOps. Both, DevOps and performance engineering have been growing trends over the past one to two years, in no small part due to the rise in importance of identifying performance anomalies in the operations (Ops) of cloud and big data systems and feeding these back to the development (Dev). However, so far, the research community has treated software engineering, performance engineering, and cloud computing mostly as individual research areas. We aimed to identify cross-community collaboration, and to set the path for long-lasting collaborations towards performance-aware DevOps. The main goal of the seminar was to bring together young researchers (PhD students in a later stage of their PhD, as well as PostDocs or Junior Professors) in the areas of (i) software engineering, (ii) performance engineering, and (iii) cloud computing and big data to present their current research projects, to exchange experience and expertise, to discuss research challenges, and to develop ideas for future collaborations
    • …
    corecore