1,195 research outputs found

    Extending snBench to Support Hierarchical and Configurable Scheduling

    Full text link
    It is useful in systems that must support multiple applications with various temporal requirements to allow application-specific policies to manage resources accordingly. However, there is a tension between this goal and the desire to control and police possibly malicious programs. The Java-based Sensor Execution Environment (SXE) in snBench presents a situation where such considerations add value to the system. Multiple applications can be run by multiple users with varied temporal requirements, some Real-Time and others best effort. This paper outlines and documents an implementation of a hierarchical and configurable scheduling system with which different applications can be executed using application-specific scheduling policies. Concurrently the system administrator can define fairness policies between applications that are imposed upon the system. Additionally, to ensure forward progress of system execution in the face of malicious or malformed user programs, an infrastructure for execution using multiple threads is described

    A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures

    Full text link
    Scientific problems that depend on processing large amounts of data require overcoming challenges in multiple areas: managing large-scale data distribution, co-placement and scheduling of data with compute resources, and storing and transferring large volumes of data. We analyze the ecosystems of the two prominent paradigms for data-intensive applications, hereafter referred to as the high-performance computing and the Apache-Hadoop paradigm. We propose a basis, common terminology and functional factors upon which to analyze the two approaches of both paradigms. We discuss the concept of "Big Data Ogres" and their facets as means of understanding and characterizing the most common application workloads found across the two paradigms. We then discuss the salient features of the two paradigms, and compare and contrast the two approaches. Specifically, we examine common implementation/approaches of these paradigms, shed light upon the reasons for their current "architecture" and discuss some typical workloads that utilize them. In spite of the significant software distinctions, we believe there is architectural similarity. We discuss the potential integration of different implementations, across the different levels and components. Our comparison progresses from a fully qualitative examination of the two paradigms, to a semi-quantitative methodology. We use a simple and broadly used Ogre (K-means clustering), characterize its performance on a range of representative platforms, covering several implementations from both paradigms. Our experiments provide an insight into the relative strengths of the two paradigms. We propose that the set of Ogres will serve as a benchmark to evaluate the two paradigms along different dimensions.Comment: 8 pages, 2 figure

    Synthesis of Safe, QoS Extendible, Application Specific Schedulers for Heterogeneous Real-Time Systems

    Get PDF
    We present a new scheduler architecture, which permits adding QoS (quality of service) policies to the scheduling decisions. We also present a new scheduling synthesis method which allows a designer to obtain a safe scheduler for a particular application. Our scheduler architecture and scheduler synthesis method can be used for heterogeneous applications where the tasks communicate through various synchronization primitives. We present a prototype implementation of this scheduler architecture and related mechanisms on top of an open-source OS (operating system) for embedded systems

    COLAB:A Collaborative Multi-factor Scheduler for Asymmetric Multicore Processors

    Get PDF
    Funding: Partially funded by the UK EPSRC grants Discovery: Pattern Discovery and Program Shaping for Many-core Systems (EP/P020631/1) and ABC: Adaptive Brokerage for Cloud (EP/R010528/1); Royal Academy of Engineering under the Research Fellowship scheme.Increasingly prevalent asymmetric multicore processors (AMP) are necessary for delivering performance in the era of limited power budget and dark silicon. However, the software fails to use them efficiently. OS schedulers, in particular, handle asymmetry only under restricted scenarios. We have efficient symmetric schedulers, efficient asymmetric schedulers for single-threaded workloads, and efficient asymmetric schedulers for single program workloads. What we do not have is a scheduler that can handle all runtime factors affecting AMP for multi-threaded multi-programmed workloads. This paper introduces the first general purpose asymmetry-aware scheduler for multi-threaded multi-programmed workloads. It estimates the performance of each thread on each type of core and identifies communication patterns and bottleneck threads. The scheduler then makes coordinated core assignment and thread selection decisions that still provide each application its fair share of the processor's time. We evaluate our approach using the GEM5 simulator on four distinct big.LITTLE configurations and 26 mixed workloads composed of PARSEC and SPLASH2 benchmarks. Compared to the state-of-the art Linux CFS and AMP-aware schedulers, we demonstrate performance gains of up to 25% and 5% to 15% on average depending on the hardware setup.Postprin

    Abstraction and performance, together at last: auto-tuning message-passing concurrency on the Java virtual machine

    Get PDF
    Performance tuning is the leading justification for breaking abstraction boundaries. We target this problem for message passing concurrency (MPC) abstractions on the Java Virtual Machine (JVM). Efficient mapping of MPC abstractions to threads is critical for performance, scalability, and CPU utilization; but tedious and time consuming to perform manually. We solve this problem by putting forth a technique for automatically mapping MPC abstractions to JVM threads. In general, this mapping cannot be found in polynomial time. Our surprising observation is that characteristics of MPC abstractions and their communication patterns can be very revealing, and can help determine the mapping. Our technique addresses a number of challenges that leads to improved performance: i) balancing the computations across JVM threads, ii) reducing the communication overheads, iii) utilizing the information about cache locality, and iv) mapping MPC abstractions to threads in a way that reduces the contention between JVM threads. We have realized our technique in the Panini language that has capsules as an MPC abstraction. We also compare our mapping technique against four default mapping techniques: thread-all, round-robin-task-all, random-task-all and work-stealing. Our evaluation on wide range of benchmark programs shows that our mapping technique can improve the performance by 30%-60% over default mapping techniques

    Lock inference for systems software

    Get PDF
    Journal ArticleWe have developed task scheduler logic (TSL) to automate reasoning about scheduling and concurrency in systems software. TSL can detect race conditions and other errors as well as supporting lock inference: the derivation of an appropriate lock implementation for each critical section in a system. Lock inference solves a number of problems in creating flexible, reliable, and efficient systems software. TSL is based on a notion of asymmetrical preemption relations and it exploits the hierarchical inheritance of scheduling properties that is common in systems software

    Effectively Mapping Linguistic Abstractions for Message-passing Concurrency to Threads on the Java Virtual Machine

    Get PDF
    Efficient mapping of message passing concurrency (MPC) abstractions to Java Virtual Machine (JVM) threads is critical for performance, scalability, and CPU utilization; but tedious and time consuming to perform manually. In general, this mapping cannot be found in polynomial time, but we show that by exploiting the local characteristics of MPC abstractions and their communication patterns this mapping can be determined effectively. We describe our MPC abstraction to thread mapping technique, its realization in two frameworks (Panini and Akka), and its rigorous evaluation using several benchmarks from representative MPC frameworks. We also compare our technique against four default mapping techniques: thread-all, round-robin-task-all, random-task-all and work-stealing. Our evaluation shows that our mapping technique can improve the performance by 30%-60% over default mapping techniques. These improvements are due to a number of challenges addressed by our technique namely: i) balancing the computations across JVM threads, ii) reducing the communication overheads, iii) utilizing information about cache locality, and iv) mapping MPC abstractions to threads in a way that reduces the contention between JVM threads

    Composable Scheduler Activations for Haskell

    Get PDF
    corecore