1,296 research outputs found

    SELFISHMIGRATE: A Scalable Algorithm for Non-clairvoyantly Scheduling Heterogeneous Processors

    Full text link
    We consider the classical problem of minimizing the total weighted flow-time for unrelated machines in the online \emph{non-clairvoyant} setting. In this problem, a set of jobs JJ arrive over time to be scheduled on a set of MM machines. Each job jj has processing length pjp_j, weight wjw_j, and is processed at a rate of ℓij\ell_{ij} when scheduled on machine ii. The online scheduler knows the values of wjw_j and ℓij\ell_{ij} upon arrival of the job, but is not aware of the quantity pjp_j. We present the {\em first} online algorithm that is {\em scalable} ((1+\eps)-speed O(1ϵ2)O(\frac{1}{\epsilon^2})-competitive for any constant \eps > 0) for the total weighted flow-time objective. No non-trivial results were known for this setting, except for the most basic case of identical machines. Our result resolves a major open problem in online scheduling theory. Moreover, we also show that no job needs more than a logarithmic number of migrations. We further extend our result and give a scalable algorithm for the objective of minimizing total weighted flow-time plus energy cost for the case of unrelated machines and obtain a scalable algorithm. The key algorithmic idea is to let jobs migrate selfishly until they converge to an equilibrium. Towards this end, we define a game where each job's utility which is closely tied to the instantaneous increase in the objective the job is responsible for, and each machine declares a policy that assigns priorities to jobs based on when they migrate to it, and the execution speeds. This has a spirit similar to coordination mechanisms that attempt to achieve near optimum welfare in the presence of selfish agents (jobs). To the best our knowledge, this is the first work that demonstrates the usefulness of ideas from coordination mechanisms and Nash equilibria for designing and analyzing online algorithms

    Power Modeling for Heterogeneous Processors

    Get PDF
    As power becomes an ever more important design consideration, there is a need for accurate power models at all stages of the design process. While power models are available for CPUs and GPUs, only simple models are available for heterogeneous processors. We present a micro-benchmarkbased modeling technique that can be used for chip multiprocessor (CMPs) and accelerated processing units (APUs). We use our approach to model power on an Intel Xeon CPU and an AMD Fusion heterogeneous processor. The resulting error rate for the Xeon’s model is below 3 % and is only 7% for the Fusion. We also present a method to reduce the number of benchmarks required to create these models. Instead of running micro-benchmarks for every combination of factors (e.g. different operations or memory access patterns), we cluster similar micro-benchmarks to avoid unnecessary simulations. We show that it is possible to eliminate as many as 93 % of the compute micro-benchmarks, while still producing power models having less than 10 % error rate

    Speculative Segmented Sum for Sparse Matrix-Vector Multiplication on Heterogeneous Processors

    Full text link
    Sparse matrix-vector multiplication (SpMV) is a central building block for scientific software and graph applications. Recently, heterogeneous processors composed of different types of cores attracted much attention because of their flexible core configuration and high energy efficiency. In this paper, we propose a compressed sparse row (CSR) format based SpMV algorithm utilizing both types of cores in a CPU-GPU heterogeneous processor. We first speculatively execute segmented sum operations on the GPU part of a heterogeneous processor and generate a possibly incorrect results. Then the CPU part of the same chip is triggered to re-arrange the predicted partial sums for a correct resulting vector. On three heterogeneous processors from Intel, AMD and nVidia, using 20 sparse matrices as a benchmark suite, the experimental results show that our method obtains significant performance improvement over the best existing CSR-based SpMV algorithms. The source code of this work is downloadable at https://github.com/bhSPARSE/Benchmark_SpMV_using_CSRComment: 22 pages, 8 figures, Published at Parallel Computing (PARCO

    Scaling CUDA for Distributed Heterogeneous Processors

    Get PDF
    The mainstream acceptance of heterogeneous computing and cloud computing is prompting a future of distributed heterogeneous systems. With current software development tools, programming such complex systems is difficult and requires an extensive knowledge of network and processor architectures. Providing an abstraction of the underlying network, message-passing interface (MPI) has been the standard tool for developing distributed applications in the high performance community. The problem of MPI lies with its message-passing model, which is less expressive than the shared-memory model. Development of heterogeneous programming tools, such as OpenCL, has only begun recently. This thesis presents Phalanx, a framework that extends the virtual architecture of CUDA for distributed heterogeneous systems. Using MPI, Phalanx transparently handles intercommunication among distributed nodes. By using the shared-memory model, Phalanx simplifies the development of distributed applications without sacrificing the advantages of MPI. In one of the case studies, Phalanx achieves 28x speedup compared with serial execution on a Core-i7 processor

    Saber: window-based hybrid stream processing for heterogeneous architectures

    Get PDF
    Modern servers have become heterogeneous, often combining multicore CPUs with many-core GPGPUs. Such heterogeneous architectures have the potential to improve the performance of data-intensive stream processing applications, but they are not supported by current relational stream processing engines. For an engine to exploit a heterogeneous architecture, it must execute streaming SQL queries with sufficient data-parallelism to fully utilise all available heterogeneous processors, and decide how to use each in the most effective way. It must do this while respecting the semantics of streaming SQL queries, in particular with regard to window handling. We describe SABER, a hybrid high-performance relational stream processing engine for CPUs and GPGPUs. SABER executes windowbased streaming SQL queries in a data-parallel fashion using all available CPU and GPGPU cores. Instead of statically assigning query operators to heterogeneous processors, SABER employs a new adaptive heterogeneous lookahead scheduling strategy, which increases the share of queries executing on the processor that yields the highest performance. To hide data movement costs, SABER pipelines the transfer of stream data between different memory types and the CPU/GPGPU. Our experimental comparison against state-ofthe-art engines shows that SABER increases processing throughput while maintaining low latency for a wide range of streaming SQL queries with small and large windows sizes

    Scheduling Fork-Join Task Graphs to Heterogeneous Processors

    Full text link
    The scheduling of task graphs with communication delays has been extensively studied. Recently, new results for the common sub-case of fork-join shaped task graphs were published, including an EPTAS and polynomial algorithms for special cases. These new results modelled the target architecture to consist of homogeneous processors. However, forms of heterogeneity become more and more common in contemporary parallel systems, such as CPU--accelerator systems, with their two types of resources. In this work, we study the scheduling of fork-join task graphs with communication delays, which is representative of highly parallel workloads, onto heterogeneous systems of related processors. We present an EPAS, and some polynomial time algorithms for special cases, such as with equal processing costs or unlimited resources. Lastly, we briefly look at the above described case of two resource-types and its implications. It is interesting to note, that all results here also apply to scheduling independent tasks with release times and deadlines.Comment: 14 page
    • …
    corecore