438 research outputs found

    Managing Communication Latency-Hiding at Runtime for Parallel Programming Languages and Libraries

    Full text link
    This work introduces a runtime model for managing communication with support for latency-hiding. The model enables non-computer science researchers to exploit communication latency-hiding techniques seamlessly. For compiled languages, it is often possible to create efficient schedules for communication, but this is not the case for interpreted languages. By maintaining data dependencies between scheduled operations, it is possible to aggressively initiate communication and lazily evaluate tasks to allow maximal time for the communication to finish before entering a wait state. We implement a heuristic of this model in DistNumPy, an auto-parallelizing version of numerical Python that allows sequential NumPy programs to run on distributed memory architectures. Furthermore, we present performance comparisons for eight benchmarks with and without automatic latency-hiding. The results shows that our model reduces the time spent on waiting for communication as much as 27 times, from a maximum of 54% to only 2% of the total execution time, in a stencil application.Comment: PREPRIN

    Mapping and Scheduling of Directed Acyclic Graphs on An FPFA Tile

    Get PDF
    An architecture for a hand-held multimedia device requires components that are energy-efficient, flexible, and provide high performance. In the CHAMELEON [4] project we develop a coarse grained reconfigurable device for DSP-like algorithms, the so-called Field Programmable Function Array (FPFA). The FPFA devices are reminiscent to FPGAs, but with a matrix of Processing Parts (PP) instead of CLBs. The design of the FPFA focuses on: (1) Keeping each PP small to maximize the number of PPs that can fit on a chip; (2) providing sufficient flexibility; (3) Low energy consumption; (4) Exploiting the maximum amount of parallelism; (5) A strong support tool for FPFA-based applications. The challenge in providing compiler support for the FPFA-based design stems from the flexibility of the FPFA structure. If we do not use the characteristics of the FPFA structure properly, the advantages of an FPFA may become its disadvantages. The GECKO1project focuses on this problem. In this paper, we present a mapping and scheduling scheme for applications running on one FPFA tile. Applications are written in C and C code is translated to a Directed Acyclic Graphs (DAG) [4]. This scheme can map a DAG directly onto the reconfigurable PPs of an FPFA tile. It tries to achieve low power consumption by exploiting locality of reference and high performance by exploiting maximum parallelism

    Parallelizing with BDSC, a resource-constrained scheduling algorithm for shared and distributed memory systems

    No full text
    International audienceWe introduce a new parallelization framework for scientific computing based on BDSC, an efficient automatic scheduling algorithm for parallel programs in the presence of resource constraints on the number of processors and their local memory size. BDSC extends Yang and Gerasoulis's Dominant Sequence Clus-tering (DSC) algorithm; it uses sophisticated cost models and addresses both shared and distributed parallel memory architectures. We describe BDSC, its integration within the PIPS compiler infrastructure and its application to the parallelization of four well-known scientific applications: Harris, ABF, equake and IS. Our experiments suggest that BDSC's focus on efficient resource man-agement leads to significant parallelization speedups on both shared and dis-tributed memory systems, improving upon DSC results, as shown by the com-parison of the sequential and parallelized versions of these four applications running on both OpenMP and MPI frameworks

    QoS and security-aware task assignment and scheduling in real-time systems

    Get PDF
    Security issues in mission-critical real-time systems (e.g., command and control systems) are becoming increasingly important as there are growing needs for satisfying information assurance in these systems. In such systems, it is important to guarantee real-time deadlines along with the security requirements (e.g., confidentiality, integrity, and availability) of the applications. Traditionally, resource management in real-time systems has focused on meeting deadlines along with satisfying fault-tolerance and/or resource constraints. Such an approach is inadequate to accommodate security requirements into resource management algorithms. Based on the imprecise computation paradigm, a task can have several Quality of Service (QoS) levels, higher QoS result incurs higher computational cost. Similarly, achieving a higher level of confidentially requires stronger encryption, which incurs higher computational cost. Therefore, there exists a tradeoff between schedulability of the tasks on the one hand, and the accuracy (QoS) and security of the results produced on the other hand. This tradeoff must be carefully accounted in the resource management algorithms. In this context, this dissertation makes the following contributions: (i) formulation of scheduling problems accounting both deadline and security requirements of workloads in real-time systems, (ii) development of novel task allocation and scheduling algorithms for such workloads, (iii) and evaluation of the results through simulation studies and a limited test evaluations in one case. In particular, the following are the three key contributions. Firstly, the problem of scheduling a set of non-preemptable real-time tasks with security and QoS requirements with the goal of maximizing integrated QoS and security of the system is addressed. This problem is formulated as MILP, and then its complexity is proved to be NP-hard. An online efficient heuristic algorithm is developed as the problem is NP-hard. Simulation studies for a wide range of workload scenarios showed that the proposed algorithm outperforms a set of baseline algorithms. Further, the proposed algorithm\u27s performance is close to the optimal solution in a specific special case of the problem. Secondly, a static assignment and scheduling of a set of dependent real-time tasks, modeled as Directed Acyclic Graph (DAG), with security and QoS requirements in heterogeneous real-time system with the objective of maximizing Total Quality Value (TQV) of the system is studied. This problem is formulated as MINLP. Since this problem is NP-hard, a heuristic algorithm to maximize TQV while satisfying the security constraint of the system is developed. The proposed algorithm was evaluated through extensive simulation studies and compared to a set of baseline algorithms for variations of synthetic workloads. The proposed algorithm outperforms the baseline algorithms in all the simulated conditions for fully-connected and shared bus network topologies. Finally, the problem of dynamic assignment and scheduling of a set of dependent tasks with QoS and security requirements in heterogeneous distributed system to maximize the system TQV is addressed. Two heuristic algorithms to maximize TQV of the system are proposed because the problem is NP-hard. The proposed algorithms were evaluated by extensive simulation studies and by a test experiment in InfoSpher platform. The proposed algorithms outperform the baseline algorithms in most of the simulated conditions for fully-connected and shared bus network topologies

    Optimizing iterative data-flow scientific applications using directed cyclic graphs

    Get PDF
    Data-flow programming models have become a popular choice for writing parallel applications as an alternative to traditional work-sharing parallelism. They are better suited to write applications with irregular parallelism that can present load imbalance. However, these programming models suffer from overheads related to task creation, scheduling and dependency management, limiting performance and scalability when tasks become too small. At the same time, many HPC applications implement iterative methods or multi-step simulations that create the same directed acyclic graphs of tasks on each iteration. By giving application programmers a way to express that a specific loop is creating the same task pattern on each iteration, we can create a single task directed acyclic graph (DAG) once and transform it into a cyclic graph. This cyclic graph is then reused for successive iterations, minimizing task creation and dependency management overhead. This paper presents the taskiter, a new construct we propose for the OmpSs-2 and OpenMP programming models, allowing the use of directed cyclic task graphs (DCTG) to minimize runtime overheads. Moreover, we present a simple immediate successor locality-aware heuristic that minimizes task scheduling overhead by bypassing the runtime task scheduler. We evaluate the implementation of the taskiter and the immediate successor heuristic in 8 iterative benchmarks. Using small task granularities, we obtain a geometric mean speedup of 2.56x over the reference OmpSs-2 implementation, and a 3.77x and 5.2x speedup over the LLVM and GCC OpenMP runtimes, respectively.This work was supported in part by the European Union’s Horizon 2020/EuroHPC Research and Innovation Programme (DEEP-SEA) under Grant 955606; in part by the Spanish State Research Agency—Ministry of Science and Innovation, Generalitat de Catalunya, under Project PCI2021121958 and Project 2021-SGR-01007; in part by the Spanish Ministry of Science and Technology under Contract PID2019-107255GB; and in part by Severo Ochoa under Grant CEX2021-001148-S/MCIN/AEI/10.13039/501100011033.Peer ReviewedPostprint (published version

    Workflow Scheduling Techniques and Algorithms in IaaS Cloud: A Survey

    Get PDF
    In the modern era, workflows are adopted as a powerful and attractive paradigm for expressing/solving a variety of applications like scientific, data intensive computing, and big data applications such as MapReduce and Hadoop. These complex applications are described using high-level representations in workflow methods. With the emerging model of cloud computing technology, scheduling in the cloud becomes the important research topic. Consequently, workflow scheduling problem has been studied extensively over the past few years, from homogeneous clusters, grids to the most recent paradigm, cloud computing. The challenges that need to be addressed lies in task-resource mapping, QoS requirements, resource provisioning, performance fluctuation, failure handling, resource scheduling, and data storage. This work focuses on the complete study of the resource provisioning and scheduling algorithms in cloud environment focusing on Infrastructure as a service (IaaS). We provided a comprehensive understanding of existing scheduling techniques and provided an insight into research challenges that will be a possible future direction to the researchers
    corecore