160 research outputs found

    Supporting Parallelism in Server-based Multiprocessor Systems

    Get PDF
    Developing an efficient server-based real-time scheduling solution that supports dynamic task-level parallelism is now relevant to even the desktop and embedded domains and no longer only to the high performance computing market niche. This paper proposes a novel approach that combines the constant bandwidth server abstraction with a work-stealing load balancing scheme which, while ensuring isolation among tasks, enables a task to be executed on more than one processor at a given time instant.Comment: WiP Session of the 31st IEEE Real-Time Systems Symposiu

    MARACAS: a real-time multicore VCPU scheduling framework

    Full text link
    This paper describes a multicore scheduling and load-balancing framework called MARACAS, to address shared cache and memory bus contention. It builds upon prior work centered around the concept of virtual CPU (VCPU) scheduling. Threads are associated with VCPUs that have periodically replenished time budgets. VCPUs are guaranteed to receive their periodic budgets even if they are migrated between cores. A load balancing algorithm ensures VCPUs are mapped to cores to fairly distribute surplus CPU cycles, after ensuring VCPU timing guarantees. MARACAS uses surplus cycles to throttle the execution of threads running on specific cores when memory contention exceeds a certain threshold. This enables threads on other cores to make better progress without interference from co-runners. Our scheduling framework features a novel memory-aware scheduling approach that uses performance counters to derive an average memory request latency. We show that latency-based memory throttling is more effective than rate-based memory access control in reducing bus contention. MARACAS also supports cache-aware scheduling and migration using page recoloring to improve performance isolation amongst VCPUs. Experiments show how MARACAS reduces multicore resource contention, leading to improved task progress.http://www.cs.bu.edu/fac/richwest/papers/rtss_2016.pdfAccepted manuscrip

    Limits on Fundamental Limits to Computation

    Full text link
    An indispensable part of our lives, computing has also become essential to industries and governments. Steady improvements in computer hardware have been supported by periodic doubling of transistor densities in integrated circuits over the last fifty years. Such Moore scaling now requires increasingly heroic efforts, stimulating research in alternative hardware and stirring controversy. To help evaluate emerging technologies and enrich our understanding of integrated-circuit scaling, we review fundamental limits to computation: in manufacturing, energy, physical space, design and verification effort, and algorithms. To outline what is achievable in principle and in practice, we recall how some limits were circumvented, compare loose and tight limits. We also point out that engineering difficulties encountered by emerging technologies may indicate yet-unknown limits.Comment: 15 pages, 4 figures, 1 tabl

    Architecture aware parallel programming in Glasgow parallel Haskell (GPH)

    Get PDF
    General purpose computing architectures are evolving quickly to become manycore and hierarchical: i.e. a core can communicate more quickly locally than globally. To be effective on such architectures, programming models must be aware of the communications hierarchy. This thesis investigates a programming model that aims to share the responsibility of task placement, load balance, thread creation, and synchronisation between the application developer and the runtime system. The main contribution of this thesis is the development of four new architectureaware constructs for Glasgow parallel Haskell that exploit information about task size and aim to reduce communication for small tasks, preserve data locality, or to distribute large units of work. We define a semantics for the constructs that specifies the sets of PEs that each construct identifies, and we check four properties of the semantics using QuickCheck. We report a preliminary investigation of architecture aware programming models that abstract over the new constructs. In particular, we propose architecture aware evaluation strategies and skeletons. We investigate three common paradigms, such as data parallelism, divide-and-conquer and nested parallelism, on hierarchical architectures with up to 224 cores. The results show that the architecture-aware programming model consistently delivers better speedup and scalability than existing constructs, together with a dramatic reduction in the execution time variability. We present a comparison of functional multicore technologies and it reports some of the first ever multicore results for the Feedback Directed Implicit Parallelism (FDIP) and the semi-explicit parallelism (GpH and Eden) languages. The comparison reflects the growing maturity of the field by systematically evaluating four parallel Haskell implementations on a common multicore architecture. The comparison contrasts the programming effort each language requires with the parallel performance delivered. We investigate the minimum thread granularity required to achieve satisfactory performance for three implementations parallel functional language on a multicore platform. The results show that GHC-GUM requires a larger thread granularity than Eden and GHC-SMP. The thread granularity rises as the number of cores rises

    BAG : Managing GPU as buffer cache in operating systems

    Get PDF
    This paper presents the design, implementation and evaluation of BAG, a system that manages GPU as the buffer cache in operating systems. Unlike previous uses of GPUs, which have focused on the computational capabilities of GPUs, BAG is designed to explore a new dimension in managing GPUs in heterogeneous systems where the GPU memory is an exploitable but always ignored resource. With the carefully designed data structures and algorithms, such as concurrent hashtable, log-structured data store for the management of GPU memory, and highly-parallel GPU kernels for garbage collection, BAG achieves good performance under various workloads. In addition, leveraging the existing abstraction of the operating system not only makes the implementation of BAG non-intrusive, but also facilitates the system deployment

    EDF scheduling of real-time tasks on multiple cores: Adaptive Partitioning vs. Global Scheduling

    Get PDF
    This paper presents a novel migration algorithm for real-time tasks on multicore systems, based on the idea of migrating tasks only when strictly needed to respect their temporal constraints and a combination of this new algorithm with EDF scheduling. This new “adaptive migration” algorithm is evaluated through an extensive set of simulations showing good performance when compared with global or partitioned EDF: our results highlight that it provides a worst-case utilisation bound similar to partitioned EDF for hard real-time tasks and an empirical tardiness bound (like global EDF) for soft real-time tasks. Therefore, the proposed scheduler is effective for dealing with both hard and soft real-time workloads

    A unified runtime system for heterogeneous multicore architectures

    Get PDF
    International audienceApproaching the theoretical performance of heterogeneous multicore architectures, equipped with specialized accelerators, is a challenging issue. Unlike regular CPUs that can transparently access the whole global memory address range, accelerators usually embed local memory on which they perform all their computations using a specific instruction set. While many research efforts have been devoted to offloading parts of a program over such coprocessors, the real challenge is to find a programming model providing a unified view of all available computing units. In this paper, we present an original runtime system providing a high-level, unified execution model allowing seamless execution of tasks over the underlying heterogeneous hardware. The runtime is based on a hierarchical memory management facility and on a codelet scheduler. We demonstrate the efficiency of our solution with a LU decomposition for both homogeneous (3.8 speedup on 4 cores) and heterogeneous machines (95% efficiency). We also show that a "granularity aware" scheduling can improve execution time by 35%

    실시간 멀티코어 플루이드 스케줄링에서 전체 시스템의 시간 · 밀도 트레이드오프

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 공과대학 전기·컴퓨터공학부, 2017. 8. 이창건.Recent parallel programming frameworks such as OpenCL and OpenMP allow us to enjoy the parallelization freedom for real-time tasks. The parallelization freedom creates the time vs. density tradeoff problem in fluid scheduling, i.e., more parallelization reduces thread execution times but increases the density. By system-widely exercising this tradeoff, this dissertation proposes a parameter tuning of real-time tasks aiming at maximizing the schedulability of multicore fluid scheduling. The experimental study by both simulation and actual implementation shows that the proposed approach well balances the time and the density, and results in up to 80% improvement of the schedulability.1 Introduction 1 1.1 Motivation and Objective 1 1.2 Approach 3 1.3 Organization 4 2 Related Work 6 2.1 Real-Time Scheduling 6 2.1.1 Workload Model 6 2.1.2 Scheduling on Multicore Systems 7 2.1.3 Period Control 9 2.1.4 Real-Time Operating System 10 2.2 Parallel Computing 10 2.2.1 Parallel Computing Framework 10 2.2.2 Shared Resource Management 12 3 System-wide Time vs. Density Tradeoff with Parallelizable Periodic Single Segment Tasks 14 3.1 Introduction 14 3.2 Problem Description 14 3.3 Motivating Example 21 3.4 Proposed Approach 26 3.4.1 Per-task Optimal Tradeoff of Time and Density 26 3.4.2 Peak Density Minimization for a Task Group with the Same Period 27 3.4.3 Heuristic Algorithm for System-wide Time vs. Density Tradeoff 38 3.5 Experimental Results 45 3.5.1 Simulation Study 45 3.5.2 Actual Implementation Results 51 4 System-wide Time vs. Density Tradeoff with Parallelizable Periodic Multi-segment Tasks 64 4.1 Introduction 64 4.2 Problem Description 64 4.3 Extension to Parallelizable Periodic Multi-segment Task Model 70 4.3.1 Peak Density Minimization for a Task Group of Multi-segment Tasks with Same Period 71 4.3.2 Heuristic Algorithm for System-wide Time vs. Density Tradeoff 78 5 Conclusion 81 5.1 Summary 81 5.2 Future Work 82 References 84 Appendices 100 A Period Harmonization 100Docto
    corecore